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ABSTRACT 



The multi-backend database system(MBDS) is a research effort conducted 
jointly by the Naval Postgraduate School and Ohio State University with the 
sponsorship of the STARS foundation. The MBDS is designed to overcome the 
capacity growth and performance gain problems of the traditional database sys- 
tems and the single-backend database systems. 

To verify the aforementioned capacity growth and performance claims, a 
benchmarking methodolgy has been formulated in the Naval Postgraduate Thesis. 
"A Performance Measurement Methodology for Software Multiple-Backend Data- 
base Systems" by James R. Vincent. This thesis presents the implementation of 
the methodology in the form of a computer-aided design (CAD) system for the 
generation of test databases and test-transaction mixes for benchmarking MBDS. 
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I. INTRODUCTION 



A. THE BACKGROUND 

Information processing has a special significance in today’s real world 
applications. Government, private, and commercial organizations are increasingly 
dependent on timely, accurate information. Within the computer science 
community, several areas of research have sprouted new. innovative approaches to 
satisfy the information needs of these organizations. Today, numerous variations 
of database systems have flooded the market. Database systems consist of both 
hardware components and software packages designed for fast, accurate, efficient, 
and economical information processing. These systems are better known as 
database management systems (DBMS). 

1. Three Database-System Approaches 

Three database-system approaches have emerged: l) the traditional 

mainframe-based system. 2) the software single-backend system, and 3) the 

software multiple-backend system [Ref. l:pp. 3-5]. The software multiple-backend 

system is designed to overcome the upgrade and performance problems 

experienced with both the traditional and the single-backend systems. This 

system is more unconventional than the first two and is a new kind of database 

computer. The software multiple-backend system is new because it resembles 
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neither the the traditional approach to database management by placing the 
system software in a mainframe computer such as SQL/DS in an IBM 3033. nor 
the more recent approach to database management by utilizing the dedicated 
hardware 
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Figure 1. The Traditional Approach to Database Management 



and software in a single-backend computer such as the Britton-Lee IDM 500. 

In the mainframe-based approach, a database system is characterized 
as an applications program (albeit, a major one) which shares the resources of the 
mainframe computer with other applications programs (depicted in Figure I). In 
the single-backend approach a database system has the exclusive control and 
use of the resources of the entire backend computer (depicted in Figure 2). The 
term backend connotes the 'back' of terminals or general-purpose computers, 
where neither the terminals nor the general-purpose computers (termed the 
hosts) provide the database management services. Instead, the database 
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Figure 2. The Software Single-Backend Approach to Database Management 

management services are provided by the backend computer to the user or user 
programs (transactions) via the host. 

2. Software Multiple-Backend Database Systems 

The new kind of database computers (depicted in Figure 3) is of the 
multiple-backend approach where the database system is not mainframe- 
based and each database system consists of at least one controller and two or 
more backends with their disk systems interconnected by a network. The 
controller software is mainly dedicated to 1) the communication with the hosts 
and the terminals, 2) the scheduling and control of transaction executions by the 
backends, and 3) the routing of the responses coming from the backends back to 
the user. The backend software in each of the multiple backends is identical and 
is responsible for carrying out the primary database operations such as the 
retrieve, insert, delete, and update for the transactions. Examples of the 
multiple-backend approach to database management are the commercial Teradata 
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DBC/1012 system, and the Naval Postgraduate School's experimental multi- 
backend database system (MDBS) (Ref. l:pp. 5-6]. 

Unlike the mainframe-based and single-backend approaches, the 
multiple-backend approach emphasizes great-capacity and high-performance 
database management. Furthermore, it attempts to relate the capacity growth 
and performance gains to the number of backends in the system. In other words, 
when new backends and their disk systems are added to a multiple-backend 




Figure 3. The Software Multiple-Backend Approach to Database Management 
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database computer, an increase in both the capacity and the performance would 



likely be produced. The MDBS system is expandable in terms of tens of backends 
and associated disk systems, and the DBC/1012 is expandable in terms of 
hundreds of backends and disks. Certainly, in the former system, the capacity 
growth and performance can be measured in tens and in the latter in hundreds. 

B. PERFORMANCE AND CAPACITY-GROWTH CLAIMS 

To overcome the performance problems and upgrade issues of the traditional 
mainframe-based approach and of the conventional software single-backend 
approach, the software multiple-backend approach provides performance gains 
through specialization of the database operations on dedicated, multiple backends. 
The system utilizes multiple backends connected in a parallel fashion in order to 
achieve performance gains and capacity growth. The backend controller is 
responsible for supervising the execution of database transactions and for the 
interfacing with the hosts and users. The backends perform the database 
operations with the database stored and evenly distributed across the disk systems 
of the backends. The controller and backends are connected by a communications 
bus. Users access the system either by way of the hosts or through the controller 
directly. 

The two goals of the software multi-backend database system are of course to 
overcome the performance problems and upgrade issues of the traditional 
mainframe-based or the conventional software single-backend database systems. 
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First, by increasing the number of backends, while the size of the database and 
the size of the responses to the transactions remain constant, the database system 
is to produce a reciprocal decrease in the response times of the user transactions. 
Second, by increasing the number of backends proportionally to the increase of 
transaction responses, the database system is to produce invariant response times 
for the user transactions. 

The first goal allows the multiplicity of the backends of the database system 
to be directly related to the performance gains of the database system in terms 
of the response-time reduction. Response-time reduction means the amount of 
reduction in the response time of a request, when the request is processed in a 
system with several backends as opposed to processing the same transaction in a 
one backend system, while using the same test-database set[Ref. l:p. 24]. The 
second goal enables the multiplicity of the backends of the system to be directly 
related to the capacity growth of the system in terms of response-time 
invariance. Response-time invariance means the amount of change in the 
response time of a request, when the request is processed in a one backend system 
with a response set of x records, as opposed to processing the same transaction 
in a system with m backends with a response set of mx records [Ref. l:p. 24]. 
Since the size of the response set for a request is determined by the size of the 
database (i.e.. large databases generate more responses for the same request), the 
definition of response-time invariance can be restated as the amount of change in 
the response time of a request, when the request is processed in a one backend 
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system with a database size of x records, as opposed to processing the same 
transaction in a system with m backends with a database size of mx records. 
(Response set means the set of responses returned by the backends(s) to the user 
as a result of processing a transaction.) 

C. THE BENCHMARK OVERVIEW 

To verify the aforementioned performance and growth-capacity claims. 
Vincent has formulated a benchmarking methodology for software multiple- 
backend database systems. Vincent's work [Ref. 2] provides a logical continuation 
of two prior projects aimed at developing a comprehensive performance 
measurement methodology. Kovalchik f Ref. 3] has developed a set of performance 
measurement tools for conducting system testing. Tekampe and Watson fRef. 4] 
have developed a performance measurement methodology for database systems to 
provide both external and internal performance measurements by embedding 
timing checkpoints in the MBDS software to provide the required measurements. 
The information provided by the external checkpoints provides a measure of the 
response time. 

Vincent's methodology articulates numerous parameters to insure database- 
independence and application-independence when generating the benchmarking 
information. Thus, the design of the benchmarking methodology is complicated. 
There is the need: 
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1) to have test databases which can be used for testing backends of varying 
numbers, for deriving partitions I clusters) of a database, and for placing 
the partitions on parallel stores: 

2) to have test-transaction mixes which can be used for measuring primary 
database operations in terms of their response times, for verifying the 
response-time reductions due to additions of backends and the 
redistribution of the same database, for clarifying the response-time 
invariance on account of various growths in database capacity with 
corresponding additions of backends and backend stores; 

3) to have systematic ways to generate the test databases and the test- 
transaction mixes, to conduct the tests, to collect the test results, to 
interpret the results and to verify the results against established measures 
(benchmarks). 

The major portions of the design of the benchmarking methodology consist of the 
articulation and establishment of the measurement criteria and measures, the 
design, interpretation, and generation of the test databases, the test-transaction 
mixes, the test procedures, and the test configurations. 

The focus of this thesis then is the design and implementation of a 
computer-aided design (CAD) system for the generation of test transactions and 
test databases that may be used for the benchmarking of parallel, multiple- 
backend database systems. One of the most salient features of the CAD system is 
to reduce the amount of user input. The user needs only to input three essential 
elements of information: 

• the number of backends in the system to be tested. 

• the amount of disk storage per backend, and 

• the size of the data transfer from the secondary storage (disk) to the 
primary storage (main memory). 
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Once the user has provided these values, the C AD system automatically generates 
the test databases and the test transaction mixes. Furthermore, it yields a 
detailed report that guides the database evaluator through the testing process. 
The CAD system is also very generic in form, and is able to generate test 
database and transaction mixes for any combination of the input test variables. 

The CAD system described in this thesis generates the test databases, the test 
transaction mixes, and an evaluator's guide for benchmarking parallel, multiple 
backend database systems. This CAD system is a first version: the second version 
will be integrated with MBDS, allowing the testing process to be controlled and 
managed by the CAD system. The third version will add components to collect 
statistics (e.g., response times) for the different tests and calculate statistics (i.e.. 
mean and standard deviation of tests, response-time reductions and response-time 
invariances) that measure the performance of MBDS. 

D. ORGANIZATION OF THE THESIS 

The thesis is organized into three chapters in addition to this introduction. 
Chapter II provides a general overview of the CAD system structure. Chapter III 
presents the detailed design features and focuses on the major components of the 
CAD system: the generated test databases and transaction mixes, and the 
framework for the detailed user's guide (evaluator’s report). Chapter IV presents 
a summary and the conclusions of the thesis, and offers future work in 
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performance evaluation of software multi-backend database systems in general. 



and MBDS in particular. 
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II. GENERAL OVERVIEW 



Chapter II presents a three-part overview of the general structure of the CAD 
system for generating test-databases and test-transaction mixes. Part one 
describes the essence of Vincent's methodology [Ref. 2]. Part two briefly describes 
the prototype multi-backend database system under development at the 
Laboratory for Database Systems Research. Naval Postgraduate School, 
Monterey. California. Finally, part three introduces the framework of the CAD 
system. 

A. TEST-DATABASE AND TEST-TRANSACTION DESIGN FACTORS 

The database design factors presented in Vincent’s thesis [Ref. 2:pp. 29-48] 
are the backbone not only of his methodology, but of the CAD system as well. 
The generated test-database sets and the generated test-transaction mixes are the 
two major components of his performance measurement. Essentially. Vincent’s 
methodology describes how to create these test-database sets and test-transaction 
mixes. The creation of the database sets and the transaction mixes is driven by 
three critical elements of information solicited from the user. 

• the number of backends in the system to be tested. 

• the amount of disk storage per backend, and 

• the size of the data transfer from the secondary storage to the primary 
storage. 



18 



The database design factor.' are the system configuration considerations, the 



database size considerations, and the test-transaction mix considerations. 

1. System Configuration Considerations 

For a given test database, two sets of configurations must be generated, a 
set for the measurement of the response-time reduction, and a set for the 
measurement of the response-time invariance. The number of configurations 
within each set is determined by the number of backends of the system to be 
tested. A configuration may range from 1 backend to M backends. Consider a 
system with M backends and N number of bytes in the database. The database 
of X bytes must be evenly distributed over the backends. Depending on the 
configuration being used, the database must be evenly distributed to 1, 2. 3. . . .. 
or M backends. To determine a database size which permits an equal distribution 
of data to each backend in the system, the least common multiple ( 1cm) must 

be calculated for the possible system configurations of 1. 2. 3 or M backends. 

The 1cm is used in another calculation, the database-size multiple, or dbm. 
The dbm is the ultimate factor used to determine the correct database size which 
allows for equal distribution of the database across M backends. The dbm is 
discussed later. 

The total number of configurations for a given test database is 
determined by the equation: 

Number of configurations = M + (M - l) = 2M - 1 
where M is the total number of backends to be tested in the system. 
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To verify the performance-gain claim, the database must be distributed 



evenly across 1. 2. 3 M backends, with each distribution comprising a 

configuration. For example, given a system with two backends, two configurations 
are need to test the response-time reduction claim. The first configuration has the 
entire database on one backend, and the second configuration has the database 
split evenly between the two backends. Thus for the response-time reduction 
calculations of a M-backend system. M configurations are needed. 

To verify the growth-capacity claim, the database size must increase by a 

factor of 2. 3 M . with each configuration reflecting the incremental increase 

in size up to M. Again, given a system with two backends, two configurations are 
needed to test the response-time invariance claim. The first configuration has the 
entire database on one backend. The second configuration then doubles the size 
of the database and distributes it evenly over two backends. Thus for the 
response-time invariance calculations of a M-backend system. M configurations 
are also needed. Tables 1 and 2 reflect the database allocation with respect to the 
number of configurations. Notice however, that the first configuration for the 
response-time reduction set and the first configuration for the response-time 
invariance set are the same configuration. Eliminating the creation of the 
duplicate configuration, the total number of test configurations is equal to (M + 
M -1) or 2M - 1. Tables 3-5 summarize this information for systems configured 
with a maximum of two. three, and four backends, respectively. 
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TABLE 1. DATABASE ALLOCATION WITH RESPECT TO 
RESPONSE TIME REDUCTION 



Configuration 

Number 


Maximum Number of 
Backends to be used: 


Portion of Database 
Allocated per Backend: 


1 


1 


N 


2 


2 


N/2 


o 


9 

O 


N/S 


4 


4 


N/4 








M 


M 


N/M 


KEY. M = number of backends 

N = total number of bytes in database 



TABLE 2. DATABASE ALLOCATION WITH RESPECT TO 
RESPONSE-TIME INVARIANCE 



Configuration 

Number 


Maximum Number 
of Backends 


Total Database 
Size in Mbytes. 


1 


1 


N 


2 


2 


N*2 


9 

O 


9 

U 


N*3 


4 


4 


N*4 








M 


M 


N*M 


KEY. M = number of backends. 

N = total number of bytes in database. 



2. Database-size Considerations 

To adequately measure the performance characteristics of a software 
multiple-backend system, three different database sizes are preferred (Ref.2: 
p.33]. all of which are a multiple of the base (original) size N. One size represents 
a small database (N/4), another size represents an intermediate size database 
(N/2). and the final size is the base database size N. The base (original) database 
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size X is determined by the database-size multiple, (dbm). The dbm is calculated 



as follows: 



dbm = lcm{l.2.3,....M} x 32 x record-size 
The 1cm (least common multiple) is determined using the maximum number of 
TABLE 3. TEST CONFIGURATIONS WITH TWO BACKENDS. 



Configuration 


Number of 


Mbytes per 


Total database 


Number: 


Backends: 


Backend: 


Size in Mbytes 


1 


1 


N 


N 


2 


2 


N/2 


N 


9 

O 


2 


N 


2N 


Note: 








Configuration's {1.2} are required to verify the response-time reduction claim 
Configuration’s {1,3} are required to verify the response-time invariance claim 



TABLE 4. TEST CONFIGURATIONS WITH THREE BACKENDS. 



Configuration 

Number: 


Number of 
Backends: 


Mbytes per 
Backend: 


Total database 
Size in Mbytes 


1 


1 


N 


N 


2 


2 


N/2 


N 


9 


9 

O 


N 3 


N 


4 


2 


N 


2N 


5 


9 

«J 


N 


3N 


Note: 

Configuration’s {1.2.3} are required to verify the response-time reduction claim 
Configuration’s {1.4,5}* are required to verify the response-time invariance claim 
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TABLE 5. TEST CONFIGURATIONS WITH FOUR BACKENDS. 



Configuration 

Number: 


Number of 
Backends: 


Mbytes per 
Backend: 


Total database 
Size in Mbytes 


1 


1 


N 


N 


2 


2 


N/2 


N 


o 

o 


3 


N/S 


N 


4 


4 


N/4 


N 


5 


2 


N 


2N 


6 


9 

O 


N 


SN 


7 


4 


N 


4N 


Note: 

Configuration’s {1,2, 3, 4} are required to verify the response-time reduction claim 
Configuration’s {1,5, 6, 7} are required to verify the response-time invariance claim 



backends. M. in the system. The record-size value is hardware specific because it 
depends on the size of the unit of data management of the particular system's 
architecture [Ref. 5:pp. 16-17], The size of the unit of data management is the size 
of the data transfer from the secondary storage to the primary storage. This 
element of information is one of three elements of information solicited from the 
user. Strawser [Ref. 5:pp. 16-17] offers a scheme for dividing this disk track size 
into four record sizes. The largest of the four record sizes is used in the dbm 
calculation. The largest record size must also be divisible by each of the three 
smaller record sizes to simplify the database sizing process. The CAD system 
implements Strawser s scheme by having the four record sizes as fixed multiples of 
one another. 

Vincent’s methodology explains the requirement for the database-multiple 
(dbm) to be a multiple of 32 [Ref. 2:p. 44]. Basically, the number 32 is necessary 
because: 
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• the database is to be equally divided among the four record sizes, i.e.. 
divisible by 4. 

• the decision to have the smallest database (N/4) to be one fourth the size the 
base database size N.i.e.. divisible by another 4. and 

• a parameter relating to the MBDS storage mechanism, (i.e.. each MBDS 
cluster must hold an even number of records) i.e., divisible by 2. 

Thus. 4 x 4 x 2 = 32. 

The final criteria for how large the database size may be is of course 
dependent on the available disk storage of the type of backend to be used in the 
system. Obviously, for testing the database size cannot be larger than the 
available disk storage of the single backend. The database-multiple determines 
exactly how much of the available disk storage is used to accommodate the largest 
database size. A trivial example consists of 
dbm = 3 units, 

available disk storage = 10 units, and 
largest database size = 9 units. 

Note that the dbm is able to be folded into the available disk storage 3 times, 
thus allowing for a database size of 9. If the dbm was 4 units, then the largest 
database size would be 8. A more concrete example follows. A system is defined 
to have the following: 



available disk storage: 300 Megabytes 

disk track size: 4000 bytes 

number of backends: 3 
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Strawser’s scheme (based on the disk track size) for selecting record sizes is: 



large record size: 


2000 


bytes 


med-lg record size: 


1000 


bytes 


medium record size: 


400 


bytes 


small record size: 


200 


bytes 



The calculations are: 

lcm = 6 { 1, 2. and 3 backends} 

dbm = lcm x 32 x large record size = 6 x 32 x 2000 = 384,000 bytes 
N in Mbytes = 299.904 where 384.000 can be folded into 300 Mbytes 
781 complete times 



Table 6 reflects the database size calculations when the number of backends is 
allowed to increase. Note the dbm for 11 backends. The dbm itself exceeds the 
available disk storage of 300 Mbytes! Thus, the maximum number of backends 
allowable in the system is 10. The database design factors may be summarized as 
follows: 



1) three database sizes that best measure the performance claim are N, N/2, 
and N/4 . 

2) the number of backends in the system is used to calculate the least 
common denominator, lcm. 

3) the four record sizes are calculated using Strawser’s scheme. These 
calculations are dependent on the disk track size. The largest record size is 
used to help calculate the database-multiple. 

4) the database-multiple is calculated. The dbm is dependent on the lcm and 
the largest record size. 
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TABLE 6. MBDS DATABASE SIZE CALCULATIONS. 



M 


LCM of 
{1 M} 


DBM in 
Bytes 


N in 
Mbytes 


N 2 in 
Mbytes 


N/4 in 
Mbytes 


2 


2 


128,000 


299.904 


149.952 


74.976 


9 

O 


6 


384.000 


299.904 


149.952 


74.976 


4 


12 


768,000 


299.520 


149.760 


74.880 


5 


60 


3,840,000 


299.520 


149.760 


74.880 


6 


60 


3,840,000 


299.520 


149.760 


74.880 


7 


420 


26.880,000 


295.680 


147.840 


73.920 


8 


840 


53.760,000 


268.800 


134.400 


67.200 


9 


2.520 


161.280.000 


161.280 


80.640 


40.320 


10 


2.520 


161,280,000 


161.280 


80.640 


40.320 


11 


27.720 


1.774,080.000 


1,774.080 


887.040 


443.520 



where: 

M = maximum number of backends in the database. 

LCM = Least Common Multiple. 

DBM = (LCM{1 M} * 32 * rec_size) for rec size = 2000-bytes. 

N = Size in Mbytes of large test database. 

N/2 = Size in Mbytes of medium size test database. 

N/4 = Size in Mbytes of small test database. 

Assumption: Largest database allowable is 300 Mbytes. 



5) the values of N. N/2, and N/4 are calculated based on the dbm and the 
available disk storage. 

Vincent's methodology discusses formatting the databases using one of 
two options: (l) the use of only one record type per database, or (2) the inclusion 
of all four record sizes in a single database. Because the CAD system is being 
designed to specifically test the Naval Postgraduate School's experimental 
software multiple-backend system, option (2) is implemented. 

3. Test-Transaction Mix Considerations 

To demonstrate the response-time invariance of software multiple- 
backend systems, the CAD system must ensure that any increase in the number of 
backends in the system is accompanied by a proportional increase in the size of 
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the database, and in the size of the response set returned by the test-transaction 
mix. The selection of the test-transaction mix which permits the database size to 
increase in the same proportion t * the increase in the response set is much more 
complex. The selection requires a complete understanding of the characteristics 
and features of the data model and data manipulation language [Ref. 2:p. 48]. 
Vincent has devised a methodology that creates the test-record organization, a 
test-database structure, and a test-transaction mix set which enables the system 
evaluator to use the same organization, structure, and mix for all system 
configurations without modification! 

Vincent also cites the work of Hawthorn and Stonebreaker [Ref. 6] which 
suggests the use of three sets of test transactions to test the overall performance 
of MBDS. One set consists of overhead-intensive queries, the second set 
consists of data-intensive queries, and the third set consists of multi-relation 
queries. Vincent ensures that all of these factors are appropriately considered 
when selecting transactions to verify the performance-gain and capacity-growth 
claims. 

4. System-Dependent Factors 

All of the design factors mentioned thus far have been independent of the 
system with the exception of the record-size selection. Vincent’s methodology has 
shown that the design for the database set satisfies the requirements for 
accommodating all required test-system configurations: Vincent has also 

demonstrated that the transaction-mix generation is well conceived and 
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appropriately encompasses the requisite consideration^ for verification of the 
performance-gain and capacity-growth claims. However, there are two 
considerations that are system specific, the data model and record composition. 
The data model used by the database system is directly related to the system's 
data management strategy, to include such factors as the directory structure and 
record distribution mechanism. The record composition may rely on specific 
system idioms and constraints, such as limits on field width, or on the number of 
fields within a record, etc. 

B. THE MULTIPLE-BACKEND DATABASE SYSTEM (MBDS) 

In the Laboratory for Database Systems Research, a prototyped software 
multi-backend database system, known as MBDS. has the same architectural 
organization depicted in figure 3 for the software multiple-backend approach to 
database management. One minicomputer or microcomputer functions as the 
controller, while one or more microcomputers and their disk systems serve as 
backends. Both the controller and the backends are interconnected by a 
broadcast bus. Together, the controller, the broadcast bus. and the backends 
comprise a database system which is specifically designed to overcome the the 
performance and capacity-growth problems experienced by traditional 
mainframe-based and conventional software single-backend database systems. 
The data in the MBDS system is distributed across the disk systems of each 
backend computer. Consequently, a user transaction may be executed 
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simultaneously by all backends. The initial design and and analysis of MBDS is 
presented in Reference 7 and Reference 8. 

The MBDS hardware features a minicomputer (VAX-11-750) and 8 micro- 
computers (Integrated Solutions Incorporated workstations). All computer 
systems utilize the 4.2 BSD Unix Operating System. The VAX 11-750 serves as 
the controller, while the eight ISI workstations function as backends. The ISI 
workstation is based on the Motorola 68020 CPU, which features 16- Mbytes of 
virtual memory space per process. Each backend has one dedicated Control Data 
Corporation Winchester-type disk drive with a maximum formatted capacity of 
500-Mbytes per drive. The MBDS configuration uses Ethernet as the broadcast 
bus among the controlled backends. 

Recall there are two considerations that are system specific, the data model 
and record composition. The remainder of this section is devoted to the MBDS 
attribute-based data model and the attribute-based data language. 

1. The Attribute-Based Data Model 

MBDS is based on the attribute-based data model first proposed in 
Reference 9. In the attribute-based data model, data is considered in the following 
constructs: database, file, record, attribute-value pair, keyword, attribute-value 
range, directory keyword, non-directory keyword, director,’, record body, keyword 
predicate, and query. Informally, a database consists of a collection of files. Each 
file contains a group of records which are characterized by a unique set of 
keywords. A record is composed of two parts. The first part is a collection of 
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at tribute- value pairs or keywords. An attribute-value pair is a member of 
the Cartesian product' of the attribute name and the value domain of the 
attribute. As an example. <POPULATION. 25000> is an attribute-value pair 
having 25000 as the value for the population attribute. A record contains at most 
one attribute-value pair for each attribute defined in the database. Certain 
attribute-value pairs of a record (or a file) are called the directory keywords of 
the record (file), because either the attribute-value pairs or their attribute-value 
ranges are kept in a directory for identifying the records (files). Those 
attribute-value pairs which are not kept in the directory are called non- 
directory keywords. The rest of the record is textual information, which is 
referred to as the record body. An example of a record is shown below. 

( <FILE, USCensus>. <CITY, Monterey>, <POPULATION, 25000>, 

{ Temperate climate } ) 

The angle brackets. <,>. enclose an attribute-value pair, i.e., keyword. The curly 
brackets, {.}. include the record body. The first attribute-value pair of all records 
of a file, by convention, is the same. In particular, the attribute is FILE and the 
value is the file name. A record is enclosed in the parenthesis. For example, the 
above sample record is from the USCensus file. 

The records of the database may be identified by keyword predicates. A 
keyword predicate is a 3-tuple consisting of a directory' attribute, a relational 
operator ( = . ! = . >, <, ^). and an attribute value, e.g., POPULATION ^ 20000 

is a keyword predicate. More specifically, it is a greater-than-or-equal-to 
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predicate. Combining keyword predicates in disjunctive normal form 
characterizes a query of the database. The query 

( FILE = USCensus and CITY = Monterey ) or 
( FILE = USCensus and CITY = San Jose ) 

will be satisfied by all records of the USCensus file with the CITY of either 
Monterey or San Jose. For clarity, w-e also employ parentheses for bracketing 
conjunctions in a query. 

2. The Attribute-Based Data Language (ABDL) 

The attribute-based data language supports the five primary database 
operations. INSERT. DELETE, UPDATE. RETRIEVE, and RETRIEVE- 
COMMON. A request in the ABDL is a primary operation with a qualification. 
A qualification is used to specify the part of the database that is to be operated 
on. Two or more requests may be grouped together to. form a transaction. 
Now, let us illustrate the five types of requests and forgo their formal 
specifications. 

The INSERT request is used to insert a new record into the database. 
The qualification of an INSERT request is a list of keywords with or without a 
record body being inserted. In the following example, an INSERT request that 
INSERT (<FILE, USCensus>. <CITY, Cumberland;*, <POPULATION, 40000 >) 

will insert a record without a record body into the USCensus file for the city 
Cumberland with a population of 40.000. 
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A DELETE request is used to remove one or more records from the 
database. The qualification of a DELETE request is a query. The following 
example, 

DELETE ((FILE = USCensus) and (POPULATION > 100000)) 

is a request that will delete all records whose population is greater than 100.000 in 
the USCensus file. 

An L’PDATE request is used to modify records of the database. The 
qualification of an UPDATE request consists of two parts, the query and the 
modifier. The query specifies which records of the database are to be modified. 
The modifier specifies how the records being modified are to be updated. The 
following example, 

UPDATE (FILE = USCensus) (POPULATION = POPULATION + 5000) 

is an UPDATE request that will modify all records of the USCensus file by 
increasing all populations by 5.000. In this example. (FILE = USCensus) is the 
query and (POPL T LATION = POPULATION + 5000) is the modifier. 

The RETRIEVE request is used to retrieve records of the database. The 
qualification of a retrieve request consists of a query, a target-list, and a by-clause. 
The query specifies which records are to be retrieved. The target-list consists of a 
list of output attributes. It may also consist of an aggregate operation, i. e., 
AVG, COUNT, SUM, MIN, MAX. on one or more output attribute values. The 
optional bv-clause may be used to group records when an aggregate operation is 
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optional by-clause may be used to group records when an aggregate operation is 
specified. The RETRIEVE request. 

RETRIEVE ((FILE = USCensus) and (POPULATION ^ 50000)) (CITY, POPULATION) 

will retrieve the city names and populations of all records in the USCensus file 
whose populations are greater than or equal to 50.000. ((FILE = USCensus) and 
(POPULATION ^ 50.000)) is the query and (POPULATION, CITY) is the 
target-list. -There is no use of the by-clause or aggregation in this example. 

Lastly, the RETRIEVE-COMMON request is used to merge two files by 
common attribute-values. Logically, the RETRIEVE-COMMON request can be 
considered as a transaction of two retrieve requests that are processed serially in 
the following general form. 

RETRIEVE (query-1) (target-list- 1) 

COMMON ( attribute- 1. attribute-2) 

RETRIEVE (query-2) (target-list-2) 

The common attributes are attribute-1 (associated with the first retrieve request) 

and attribute-2 (associated with the second retrieve request). In the following 

example, the RETRIEVE-COMMON request 

RETRIEVE ((FILE = CanadaCensus) and (POPULATION > 100000)) (CITY) 

COMMON (POPULATION, POPULATION) 

RETRIEVE ((FILE = USCensus) and (POPULATION > 100000)) (CITY) 

will find all records in the CanadaCensus file with population greater than 

100.000. find all records in the USCensus file with population greater than 

100.000. identify records of respective files whose population figures are common. 
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and return the two city names whose cities have the same population figures. 
ABDL provides five seemingly simple database operations, which are nevertheless 
capable of supporting complex and comprehensive transactions. 

C. THE CAD FRAMEWORK 

The CAD system is being implemented in a number of versions. The first 
version, described in this thesis, generates the test-databases, the test transaction 
mixes, and an evaluator’s guide, but is not integrated with MBDS. The databases, 
the mixes, and the guide are in fact the major components of the CAD system. 
The framework of this first version is to create a "TEST" directory of files for the 
system evaluator to manually input to MBDS. The files that are created by the 
CAD system and passed to the TEST directory represent both the generated test 
databases and the generated transaction mixes. The system evaluator is assisted 
in his testing by the guide provided by the CAD system. 

The framework of the TEST directory for a MBDS configured with up to 3 
backends is represented in Figure 4. Recalling that a 3-backend system requires a 
total of 5 configurations per database, i.e.. 

for M = 3. the number of configurations = 2M - 1 = 2(3) -1 = 5. 

Each database (small (N/4). medium (N/2). and large (N)) has its own set of 
configuration files as well as it own transaction-mix files. The configuration files 
are made up of record files, and are in fact the generated database. The 
transaction-mix file is used to test the performance and growth-capacity claims for 
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the specific database: the transaction-mix hie consists of four sets of hies, each set 
representing queries geared specifically to the record sizes CAD system generates 
and passes to the TEST directory a template hie common to all three generated 
databases. The CAD system also generates a unique descriptor hie for each 
database, small . medium and large. The descriptor hie contains indexing 
information for the database. 



TEST Directory 
(a 3-backend MBDS system) 



/work /username/ TEST 



TEST 




User's Guide 



Large DB (N) 



Medium DB (N /2 



Small DB (N 4) 



Template file 




Transaction 

mixes 



lst.2nd.3rd config 
record file 



4th config 5th config descriptor file 

record file record file 




large red size med large red size medium red size small red size 



NOTE: Each transaction mix consists of 7 sets of queries totaling 30 queries mix/ record size 



The lst,2nd.and 3rd configurations comprise a single record file which is 
the response-time reduction file 



The 4th and 5th configuration record files constitute the response-time invariance files 



Figure 4 The TEST Directory 
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1. Characteristics and Notions of the CAD System 

The CAD system is designed to receive a minimal amount of input from 
the user. Upon receiving 3 essential elements of information from the user, the 
CAD system is to perform the following actions: 

• to make a number of calculations that affect the generation of the 
database sets and the transaction mixes. 

• to create the template file, the 3 descriptor files (one for each database 
size N. N/2. N/4), the three response-time reduction record files (again, 
one for each database size), and the requisite number of response-time 
invariance record files, 

• and to produce an evaluator's guide as a very important by-product. 

The evaluator’s guide is a report that characterizes the generation of the test 
databases and transaction mixes. Also, the guide is the user’s tool to place the 
generated CAD files into MBDS for testing. The most powerful characteristic of 
the CAD system is that just about every separate entity within the CAD system 
is in some form or multiple of another entity. Within the CAD system there are 
two important concepts that serve as the nucleus for aiding in the creation of all 
of the files that are placed in the TEST director}'. 

The first concept involves the creation of a number of factors that serve 
as multipliers for entities. For example, three database sets are generated by the 
CAD system with two sets being multiples of the original, e.g.. N, N/2. and N/4. 
The first concept is to create a database factor to represent the relationships (in 
terms of multiples) among the database sizes. In particular, a data construct 
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named dbase factor represents the ratios of the database sizes. In our 
implementation, dbase factor is an array of three integers. 1.2 and 4. The utility 
of this notion is that the CAD system merely creates one database and does so 
with the dbase factor set to "1". The other two databases are generated by 
simply accessing the appropriate dbase factor from the array. We also apply the 
factor concept to represent relationships among database record sizes. Recall that 
Stawser’s scheme for the record-size selection has 3 of the record sizes as multiples 
of the 4th record size. A record factor is created to represent the relationships 
between the record sizes. Again, in particular, a data construct named 
record factor represents the ratios of the record sizes. In our implementation, 
record factor is an array of four integers. 1. 2. 5 and 10. 

The second concept is a table of numbers that reflects the distribution of 
records across the backends of MBDS . Recall that in MBDS. a cluster-based 
database placement algorithm is used to distribute clusters of records across the 
backends, and that this algorithm is tightly coupled to the attribute-base data 
model. Therefore, we develop a method by which we can successfully model the 
algorithm to insure an even distribution of database records of a cluster across the 
backends. Hence, we instantiate a data structure that can be used to model the 
concept. The name given to this data structure is the 

base record & block distribution table. The numbers placed in this 
"base" table result from a few simple calculations dependent on the user's input. 
The numbers reflect the upper bounds on the number of records, blocks, and 
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clusters formed and distributed across the backends of MBDS. The values in the 



table are used to generate the appropriate number of records needed for a pre- 
calculated database size. They are also used to determine the range values of the 
descriptor files, and they serve to determine values in the transaction mixes as 
well. 

The next chapter presents these two concepts as the building blocks for 
two of the major components of the CAD system, the generated test database sets 
and the generated test-transaction mixes. 
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III. THE MAJOR COMPONENTS OF THE CAD SYSTEM 



The major objective of the CAD system is to generate test database sets and 
corresponding transaction mixes for the purpose of benchmarking parallel, 
multiple-backend database systems. In meeting this objective, the first version of 
the CAD system is comprised of three major components, 

• the Test Database Generation Component, 

• the Test Transaction Mix Generation Component, and 

• the Evaluator’s Report Component. 

Before the CAD system is able to actively create any of the three major 
components, the CAD system initializes a number of data constructs, solicits the 
user for the three essential elements of information, and performs a number of 
initial calculations. Once the aforementioned procedure has been completed, the 
CAD system then generates the test-databases, followed by the generation of the 
test-transaction mixes. The evaluator's report is comprised of a number of text 
files interleaved with a number of empirical data tables generated simultaneously 
with both the test-databases and the test-transaction mixes. This chapter 
presents an in-depth study of the high-level organization of the CAD system, the 
components of the test-databases and test-transaction mixes, and introduces the 
framework of the report component. 
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A. HIGH-LEVEL ORGANIZATION OF THE CAD SYSTEM 



For discussion the high-level organization of the CAD system is presented as 
three areas: (l) the characteristics of CAD, (2) the files created by CAD, and (3) 
the high-level program structure of CAD. The characteristics of the CAD describe 
the mechanisms by which the values found within the files are determined. The 
created files are in fact the generated test databases and the transaction mixes, 
and comprise a part of the evaluator's report. The high-level program structure 
provides an overview of the CAD system itself, 

1. Characteristics of the CAD System 

Chapter II introduced the two concepts that serve as the nucleus for 
creating all of the files placed in the TEST directory. The first concept involved 
the creation of a number of factors that serve as multipliers for entities found 
within the CAD system. The second concept involved the creation of a base table 
that reflects the distribution of records across the backends of MBDS. This section 
presents these two concepts as the building blocks for the generation of all files 
created by the CAD system. 

a. The Creation of the CAD Factors 

The purpose of the CAD system is to generate the template file, the 
descriptor files, the record files, and the transaction-mix files. A detailed 
discussion of these files is presented later. The first three sets of files are 
indicative of the attribute-based data model. The transaction-mix files are 
generated for benchmarking purposes. The values placed in the template files. 
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descriptor files, the record files, and the transaction-mix files are characteristically 
dependent upon 

• the size of the database (small, medium, or large), and 

• the record class (large, medium-large, medium, or small). 

Vincent’s methodology [Ref. 2] defines the ratios among the database sizes, and 
Strawser [Ref. 5) defines the ratios among the record classes. To represent these 
two sets of ratios, two data constructs are introduced. The data construct 
dbase factor represents the ratios of the database sizes. In our implementation, 
dbase factor is an array of three integers, 1, 2, and 4. The data construct 
record factor represents the ratios of the record classes. In our implementation, 
record factor is an array of four integers, 1. 2, 5. and 10. Both factors are used to 
generate tables of data found in the user's guide, to perform the calculations 
necessary to create the base-record-and-block-distribution table, and to perform 
the calculations for the values of the transaction mixes. 

b. The Base-Record-and-Block-Distribution Table 

The base-record-and-block-distribution table, hereby referred to as 
the base table, represents the records, blocks, and clusters formed and 
distributed across the backends of MBDS. Vincent's methodology [Ref. 2:p. 68] 
defines nine cluster categories, with each cluster category representing a cluster 
containing from 2 to 10 blocks of records per cluster. Thus, the base table reflects 
the distribution of records, blocks, and clusters across 9 categories. Table 7 depicts 
the layout of the base table. 
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TABLE 7. BASE RECORD AND BLOCK DISTRIBUTION TABLE. 

(BASE TABLE) 



Record 


Number of | 


Numoer of 


Total 


Total 


- Total 


Number of 


Size 


Blocks 


Records 


Number 1 


Number 


Number 


Blocks 


in 


per 


per 


of 


of 


of 


per 


Bytes 


Cluster 


Cluster 


j Clusters 


Records 


Blocks 


Backend 




O 

o 

4 
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The values for the matrix are dependent on a few initial calculations which in 



turn are dependent on the user's input. The nature of these calculations is 



discussed later. However, it is appropriate to note which columns are correlated to 



dBase factor and record factor, 
column 1: record_factor 
column 2: remains constant 
column 3: record factor 
column 4: dbase factor 
column 5: record factor and dbase factor 
column 6: dbase_factor 

column 7: dbase factor, number of backends, and configuration number 
2. CAD-Generated Files 

This section introduces the files created by the CAD system for each of 
the three major components. The Database Component has three types of files 



generated to represent the three databases: 



• the Template file- one template file common to all three databases; 

• the Descriptor file- three descriptor files, one per database; and 
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• the Record file- a number of record file- (dependent on the number of 
backends in the system) representing both the response-time-reduction and 
the response-time-invariance configurations. 

The Transaction-Mix Component generates one type of file : 

• the transaction-mix file. 

All together, the CAD system generates twelve transaction mix-files, four files per 
database. Each of the four files per database represent 24 transactions for a given 
record class. 

The Evaluator's Report Component is comprised of two types of files: 

• the standard-text files 

• the empirical-data files. 

The standard-text files present a narrative for the evaluator, providing 
instructions on how to interface the CAD generated test-database and test- 
transaction-mix files with MDBS. The text files also present a discussion for 
interpeting and analyzing the empirical data calculated by the CAD system. The 
empirical data files are a collection of tables that reflect information about the 
generated test-databases and generated test-transaction mixes based upon the 
user's input. A more detailed explanation of each of the files is presented in the 
discussion of its associated component. 

3. The High-Level Program Structure of the CAD System 

The CAD system has been designed utilizing a top-down strategy. The 
system is entitled as the CAD Benchmark System. At the highest level. 5 
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subordinate tasks are performed: 



• Initialization. 

• Receiving User Input. 

• Initial Calculations, 

• Creating Files, and 

• Cleanup. 

The first 3 tasks initialize the state of the CAD system. Within the 4th task, the 
major components of the CAD system are created. The 5th task performs a 
cleanup of all temporary files and allocated memory created by the CAD system 
once the CAD system has finished. 

The high-level program structure can be described in three phases. The 
first phase is the preparation phase which encompasses the initialization of data 
structures, the solicitation of input data from the user, and the initial 
calculations. The second phase is the files creation phase which encompasses the 
generation of all the requisite test-database files, the test-transaction-mix files, 
and the report files. The third phase is the cleanup phase which encompasses 
system commands to purge all files no longer needed once the CAD system has 
completed its objective. 

a. The Preparation Phase 

The Preparation Phase begins the processing of the CAD system. 
Three tasks must be preformed prior to creating any of the three major 
components. Each task is discussed separately. 
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(1) Initialization. 



The initialization encompasses definition 



assignments, array initialization, and the declaration of data types. Variables are 
both global and local. Only the more important features of the initialization are 
addressed. The discussion covers the definitions, the arrays, key variables, and a 
structure representing the base table. 

The more important definitions are those that reflect certain 
known factors about the methodology being implemented These definitions are 
global and are shown in table 8. The maximum number of backends has been 
chosen to be 10. This is not a major constraint on the CAD system. Fixing the 
maximum number of backends has provided for fewer algorithms to be designed, 
coded and tested. 

The more important arrays are those that hold information that 
is common to each of the major components. Two sets of arrays hold such 
information. One set of arrays holds the key attribute names and the other set of 
arrays holds the initial calculations. These arrays are also global and are shown in 
table 9. 



One other array initialization important to the initial calculations 
is the least-common-multiple array known as the 1cm table. The 1cm table 
stores the least-common-multiple values for the integers 1 to 10. Recall that the 
system calculates the least common multiple for the number of backends in the 
system and the maximum number of backends presently allowed is 10. 
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TABLE 8. IMPORTANT GLOBAL DEFINITIONS 



DESCRIPTION NAME ASSIGNMENT 


Number of 
Cluster categories 


num clus cat 


9 


Number of 
database sizes 


num db sizes 


3 


Number of 
record sizes 


num red sizes 


4 


Number of 
record classes 


num red classes 


4 


Number of 
record factors 


num red factors 


4 


Number of 
database factors 


num db factors 


3 


Maximum number 
backends 


max num be 


10 



4G 



TABLE 9. IMPORTANT GLOBAL ARRAYS 



DESCRIPTION 


NAME 


VALUES 


Template names 


t name] 


Templg 






Tempmedlg 

Tempmed 

Tempsmall 


Descriptor names 


INT 1 name] 


INTONELG 






INTONEMEDLG 

INTONEMED 






INTONESMALL 




INT 2 name] 


INTTWOLG 






INTTWOMEDLG 

INTTWOMED 

INTTWOSMALL 


Record factor 


red factor]] 


1 






o 

5 

10 


Database factor 


dbase factor] 


1 






o 

4 


least common 


lc table]’ 


1 


multiple 




»> 

6 

12 

60 

60 

420 

840 

2520 

2520 


four record 






sizes 


four red sz[i 


To Be Calculated 


database 






sizes 


dbase sz[ 


♦t »t 


Number of 
attributes 
per record 






size 


num attr per red sz[| 
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The base table depicted in Table 7 has been implemented 
utilizing a structure construct. Because only specific columns of* the base table 
need to be accessed from, time to time, the structure construct is the easiest to 
implement. Each of the first six columns is referenced as a unique independent 
array within the structure. The seventh column, the number of blocks per 
backend, is dependent on the number of backends in the system. The seventh 
column is described as a structure within a structure because of the nature of the 
response-time- reduction configurations. For each of the response-time-reduction 
configurations, the number of backends matches the configuration number up to 
the maximum number of backends described in the system. For each 
configuration, column 7 is described as a separate structure with each column 
entry for a specific backend being also a unique structure. 

(2) Receive User Input. The Receive User Input module solicits the 
user for the three essential elements of information. The variable names for this 
information are: 



DESCRIPTION 


NAME 


TYPE 


xNumber of Backends 


num be 


integer 


Disk Track Size 


dsk trk sz 


integer 


Maximum Disk Storage 


max dsk storage 


integer 
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(3) Perform Initial Calculations. Eight calculations are performed 



within the Initial Calculations task. Five of the calculations are functions that 
return values assigned to global variables. The remaining three calculations are 
procedures that determine values for the three global arrays. Each of the 
calculations is followed by an explanation. 

The first calculation concerns the number of configurations. The 
number of configurations is equal to twice the number of backends less one. 



THE NUMBER OF CONFIGURATIONS 





Function 




Variable 


Function name 


Parameters Passed 


num config 


Calc config 


num be 








num config = 2 x num be - 1 



The second calculation concerns the least common multiple. By 
accessing the 1cm table array in Table 9. the value of num be being returned is 
the least common multiple, e.g.. if num be = 3. the value returned from the array 
is 6. 
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THE LEAST COMMON MULTIPLE 



Function 



Variable 


Function name 


Parameters Passed 


1cm 


Calc 1cm 


num be 



The third calculation concerns the four record sizes. The first of 
the four records sizes is 1/2 the disk track size. The remaining three sizes are 
multiples of the first. Each remaining record size is calculated by dividing the first 
record size by the respective record factor (the red factor array in Table 9). 



THE FOUR RECORD SIZES 



Procedure 


array 


Procedure name 


Parameters Passed 


four red sz 


Four record sizes() 


dsk trk sz 



The fourth calculation concerns the available disk storage. The 
available disk storage is 80% of the maximum disk storage. 20% of the disk 
storage is reserved for the MBDS directory. 
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THE AVAILABLE DISK STORAGE 



F unction 


Variable 


Function name 


Parameters Passed 


avail dsk storage 


Calc avail dsk storage 


max dsk storage 



The fifth calculation concerns the database multiple. The 
database multiple is calculated by multiplying the least common mutiple by the 
number 32 by the largest record size (from four red sz array of Table 9). 
Chapter two presented the calculation for the database multiple in detail. 

THE DATABASE MULTIPLE 



Function 



Variable 


Function name 


Parameters Passed 


dbm 


Calc database multiple 


1cm 


dbm = 1cm x 32 x four red szdg red sz . 



The sixth calculation concerns the database sizes. Recall from 
chapter II the total number of times the database multiple can be equally divided 
into the available disk storage results in a variable called folds. 
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THE DATABASE SIZES 



! 

Procedure 



Array 


Procedure name 


Parameters Passed 


db size[] 


Database sizesQ 


dbm, avail dsk storage 



Thus, the largest database size equals the database multiple multiplied by the 
number of folds: 

db sizejlargest] = dbm x folds. 

The remaining two databases (medium and small) are calculated by dividing the 
largest database size by the respective database factor. 

The seventh calculation concerns the number of attributes per 
record size. The number of attributes per record size is calculated by dividing the 
record size by the numeric common divisor 10. The reason for this calculation is 
explained in the Generated Test-Database Component section. 



THE NUMBER OF ATTRIBUTES PER RECORD SIZE 



Procedure 


Array 


Procedure name 


Parameters Passed 


num attr per red sz 


Calc num attr for ea red sz 


- 



The eighth calculation concerns the key number of records. The 
key num reds is a variable that identifies the number of records in the largest 
record class of the smallest database. This number of records is considered to be 
the key because every other record class size in each of the databases is a multiple 
of this number. The key num reds variable is the determining factor for all 
calculations made with respect to the base record and block distribution table. 
Once the initial calculations are made the Create Files Phase begins. 



THE KEY NUMBER OF RECORDS 



Function 


Variable 


Function name 


Parameters Passed 


key num reds 


Calc key num records 


- 


key num reds = (dbase szismalli num red classes) / four red szllarge . 



b. The Create-Files Phase 

The five modules of the Create Files Phase comprise the creation of 
the three major components of the CAD system: the generation of the test- 
database files, the generation of the test-transaction-mix files, and the generation 
of the report files. Most of the report files are generated simultaneously with the 
other two files. There is however one module dedicated solely to the generation of 
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one specific report. This section present- the high-level modules invoked for each 



of the major files. 

(1) Generating the Test-Database Files. Upon completion of the 
initial calculations, the CAD system begins the generation of the test-database 
files, Three high-level modules are invoked: 

• Make template file, 

• Make descriptor files, 

• Make record files. 

Each of these modules in turn calls subordinate modules which generate the the 
appropriate requisite files. The details of these modules are presented in the Test- 
Database-Generation Component. 

(2) Generating the Transaction-Mix Files. Upon completion of the 
test-database files, the CAD system begins the generation of the test-transaction- 
mix files. Just one high-level module is invoked for this task: 

• Generate trans mixes. 

The details of this module are presented in the Test-Transaction Mix Generation 
Component. 

(3) Generating the Report Files. The CAD system invokes just one 
module to generate a specific report which is placed into 3 files. The module being 
invoked is: 

• Format test benchmark databases. 
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The specific report, which has its own dedicated module, provides the user with 
three tables of information. Each table is a file and is indicative of a database. 
The information in the table depicts the test configurations for a specific database 
size. Table 10 depicts the test configurations for a small database. The system 
parameters resulting in a specific configuration, for example, are: 

available disk storage: 300 Megabytes 

disk track size: 4000 bytes 

number of backends: 3 



All of the other report files are comprised of information depicting data used in 
calculations, in relationships, and for attribute values. These report files are 
generated concurrently as the data is being generated within either of the other 
two major components. These report files include: 

• the records-per-block-relationship table. 

• the number-of-records-per-c luster-category table. 

• the record-<k-block-distribution table for each database. 

• 4 transaction-mix files per database, and 

• 4 transaction-mix workload files per database. 

The standard text files are not generated by the CAD system. These files are 
actually part of the CAD system and are interleaved with the generated report 
files when the evaluator's report manual is assembled, 
c. The Cleanup Phase 

The cleanup phase is comprised of a number of system commands to 
purge all of the generated CAD files once the user has no further need for them. 
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TABLE 10 . SMALL DB TEST CONFIGURATION'S 



Configuration 


Number 


Record 


Number o' 


My bytes 


Database 


Number 


of 


Size in 


Records per 


per 


Size in 




Backends 


Bytes 


Backend 


Backend 


Mbytes 


1 


1 


2000 


9372 


18744000 








1000 


18744 


18744000 








400 


46860 


18744000 








200 


93720 


18744000 


74.976 


o 


o 


2000 


4686 


9372000 








1000 


9372 


9372000 








400 


23430 


9372000 








200 


46860 


9372000 


74.976 


O 


o 


2000 


3124 


6248000 








1000 


6248 


6248000 








400 


15620 


6248000 








200 


31240 


6248000 


74.976 


4 


2 


2000 


9372 


18744000 








1000 


18744 


18744000 








400 


46860 


18744000 








200 


93720 


18744000 


149.952 


5 


-> 

xJ 


2000 


9372 


18744000 








1000 


18744 


18744000 








400 


46860 


18744000 








200 


93720 


18744000 


224.928 



B. THE TEST- DATABASE-GENERATION COMPONENT 

The objective of the database component is to pass to a TEST director.’ all of 
the necessary files that represent the three database sizes, small, medium and 
large. Recall that the CAD constructs a tree director' for indexing the report and 
for each of the respective databases and its associated database and transaction- 
mix files. With respect to the generated test-database, the director’ contains the 
three types of database files described by the attribute-based data model and 
utilized by MBDS. The three types of files are the template file, the descriptor 
file, and the record file. These files, generated by the CAD system, are manually 
loaded to MBDS. 
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The database component is described in two parts. The first part describes 
the generated test-database files and the second part addresses the database 
program modules. 

1. The Generated Database Files 

The CAD system generates three database sets, a large database, a 
medium database, and a small database. Each database may be described in 
terms of the generated files that represent it. i.e.. the template file, the descriptor 
file, and the record files. 

a. The Template File 

Each of the database sets (small, medium, and large) shares a 
common single file known as the template file. This file contains four independent 
templates, with each template associated with a different record class. Each 
template contains the names of both the directory and non-directory attributes, 
and reflects the structure and the type of each of the record classes. By class we 
recall that Strawser [Ref. 5] creates four record classes, large, medium-large, 
medium and small. By type we mean the type of the attribute values, i.e.. either 
string (s). integer (i). or floating number (f). 

b. The Descriptor File 

Each of the three database sets has its own unique descriptor file. 
The descriptor file contains indexing information for each directory attribute in 
the database. This indexing information takes two distinct forms. First, a 
directory attribute may be described by indexes which represent a list of the 
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possible values that the directory attribute may be assigned la type-B attribute). 
Second, the directory attribute may be described by indexes which represent 
attribute-value ranges (a type- A attribute). Each descriptor file identifies the 
directory attributes: the template attribute, and the descriptor attributes. The 
template attribute receives as a value one of four template names describing the 
record size class: template large (Templg), template medium-large 

(Tempmedlg). template medium (Tempmed), or template small 

(Tempsmall) . 

The names given to the descriptor attributes identify the descriptor 
attribute itself, and the record class associated with it. Thus the names for the 
descriptor attribute is composed of three parts. The first part is entitled INT 
meaning integer. The second part is entitled either ONE or TWO. ONE 
associates the descriptor attribute to the nine cluster categories. TWO associates 
the descriptor attribute to the number of records per cluster. The third part 
associates the descriptor attribute to the record class. LARGE (LG). MEDIUM 
LARGE (MEDLG), . MEDIUM (MED). OR SMALL (SMALL). All 
together, there are eight possible attribute names: 



INTONELG 

INTONEMEDLG 

INTONEMED 

INTONESMALL 



INTTWOLG 

INTTWOMEDLG 

INTTWOMED 

INTTWOSMALL 
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Nine sets of values are assigned to the INTONExxx descriptor attributes, each set 
representing the range values associated with a specific cluster category. The set 
of values assigned to the INTWOxxx descriptor attributes are determined by the 
maximum range value found in the corresponding INTONExxx attribute, and 
the number of records per cluster for that specific cluster category. For example, 
given a database that has as part of its record configuration 9372 2000-bvte 
records, table 11 reflects the range values for each of the descriptor attributes, 
c. The Record File 

Two distinct classes of record files are generated by the CAD system. 
The first class of record files represents the response-time-reduction (rtr) 
configurations. Each database (small, medium, and large) has its own unique rtr 
record file. The rtr file is associated with all of the rtr configurations of a given 
database because the size of that database does not change. 

The second class of record files represents the response-time- 
invariance (rti) configurations. Each database has its own set of response-time- 
invariance record files. Each rti record file is associated with a specific rti 
configuration. Recall from chapter II that the number of rti configurations is 
dependent on the number of backends. We should note however that the number 
of rti record files per database set is the same. 

The record file contains the information the user desires to have 
stored in the database. The information the CAD system places in each record 
takes five distinct forms. Each form is associated with a specific type of attribute. 
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TABLE 11 



INTONELG ATTRIBUTE VALUE RANGES. 
INTTWOLG ATTRIBUTE VALUE RANGES. 



INTONELG 
Range 
of Values 


INTTWOLG 
Range 
of Values 


1 1 ; 3 4 4 { 


\VA 
(5; 8 

1341:344] 


345:866] 


|345;350] 

[35l;356l 

[861 ; 866] 


|867;1,562| 


|667;874l 

|875;882j 

1 1 , 553; 1 . 56 2 j 


1,563;2,432] 


1 1 ,563; 1 .5 72] 
1 ,573; 1 , 582] 

|2, 423;2.432| 


2,433:3.476] 


|2.433;2.444| 

2.445;2.456| 

I3.465;3,476| 


3. 4 7 7;4 . 694 1 


|3.477;3,490| 

I3.491;3,504| 

4.681;4.694] 



GO 



TABLE 11 CONTD 



INTONELG ATTRIBUTE VALUE RANGES. 
INTTWOLG ATTRIBUTE VALUE RANGES. 



INTONELG 
Range 
of V alues 


INTTWOLG 
Range 
of V alues 


!4,695;6,086] 


|4, 695;4.710] 
|4,711;4.726] 




|6,071;6.086] 


|6,087;7.652] 


16.087:6,104) 

|6,105;6,122| 




|7,635;7,652] 


17,653:9,3721 


|7.653;7,672] 
|7, 673:7.692] 




|9,353:9.372| 



Recall that the attribute-based data model considers data in terms of several 
constructs, two of which are directory keywords, and non-directory keywords. The 
first three forms of information are the values of the template attribute and the 
two descriptor attributes. These attributes are directory keywords. The other 
attributes of the record are non-directory keywords. 

The first of the non-directory keyword attributes and the fourth form 
of information is described by an attribute entitled Multiple. The values of the 
multiple attribute are character strings representing how many times the database 
size has been multiplied, e.g.. "One”. "Two", "Three”, . . ., etc. For example, if 
the database size has been doubled, the value of the multiple attribute is "Two". 
The effect of the multiple attribute is to double, triple, etc., the size of all existing 
clusters without defining any new clusters. Further detailed discussion of the 
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ciustering concept is found in iRef. 2 . In the response-time reduction record file, 
the multiple attribute always has a value of "One". The response-time-invariance 
record file multiple attribute's value ranges from "Two" to the character string 
representing the number of times the database size increases, e.g.. for a system 
with 2 backends, the multiple attribute is assigned the values "One" and "Two". 
The database is doubled by duplicating the original record file and appending the 
duplicated file to the original file thus creating a new record file twice the size of 
the original. However, in the duplicated file the multiple attribute's value is 
changed from "One" to "Two", thus every record in the new database is unique. 
Table 2 in chapter II characterizes this assignment. 

The fifth form of information is found in the attribute entitled 
String. The purpose of the string attribute is to represent sufficiently large 
nonprocessing data in order to make up the size of the record classes. All 
subsequent attributes are string attributes as well. The number of string 
attributes is dependent on the record sizes. Recall from Strawser's scheme [Ref. 
2], four record classes are defined as large, medium-large, medium, and small. 
While the large record size is a function of the disk track size of the system, the 
remaining three sizes are multiples of the largest record size. The ratios are 1, 1/2. 
1/5. and 1/10. 

The number of string attributes is dependent upon the size of the 
respective record class. Because MBDS requires all attributes in a record to be the 
same size, a common divisor to all four record classes is selected. The CAD system 
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meets this requirement by soliciting the disk track size in increments of 1000. The 
increments of 1000 can best represent actual disk track sizes which are normally 
in increments of kilo bytes, where a kilo is actually 1024 bytes. In receiving the 
disk track size in increments of 1000. a common divisor of 10 ensures the number 
of attributes for each record size conforms to the record class ratios. For example, 
a system configured with a 4000 byte disk track size has 4 record sizes of 2000 
bytes, 1000 bytes, 400 bytes, and 200 bytes. By selecting a common divisor of 10, 
we set the attribute size to 10-bytes per attribute. Table 12 show the number of 
10-byte attributes corresponding to each record class. 

TABLE 12. NUMBER OF 10-BYTE ATTRIBUTES PER RECORD CLASS. 



Record Size 
in Bytes 


Number of 
Attributes 


2000 


200 


1000 


100 


400 


40 


200 


20 

1 



A 2000 byte record has 200 attributes of which 196 are string attributes; a 200 
byte record may have 20 attributes of which 16 are string attributes. Each string 
attribute’s value is a string of x’s, i.e.. "Xxxxxxxxx". When the database is 
queried by the generated transaction mixes, the strings' values take on greater 
significance. In the transaction process the values are changed to more meaningful 
textual information. This process is explained later. Tables 13 and 14 depict the 
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TABLE 13 



RECORD LAYOUT FOOTNOTES 



The Attribute Template s value is a character string representing 
one of four record sizes, large, medium-large, medium, and small. In this 
example the record sizes represented are 2000, 1000, 400, and 200 byte records. 

The names INTONExx and INTTWOxx represent the descriptor attributes. 

The Attribute Multiple reflects the size of the database, i.e.. 

One for size N, Two for 2N, Three for 3N, etc. A one-to-one mapping exists 
between the descriptor-id set and the value of this attribute. The Multiple 
Attribute is used to double, triple, etc., the size of all existing clusters, 
without defining any new clusters. 

* The number of string attributes is dependent upon the size of the 
respective record class, i.e., the 2000-bvte record may have 200 attributes 
of which 196 are string attributes: the 200-bvte record may have 20 attributes 
of which 16 are string attributes. 



response-time reduction record layout for a small database for a system featuring 
300 Megabytes of available disk storage with a 4000 byte disk track size, 
d. The Generated Test-Database Report Files 

Within the test-database-generation component two report files are 
created concurrently as two relationships are calculated. The first involves the 
records-per-block relationship. Recall from chapters I and II that the disk track 
size is the block size by which MBDS processes information from secondary 
memory on the backend to primary memory on the controller. Given the 
requirement to pass four different record sizes via the block, a records-per-block 
relationship is recognized. Table 15 depicts the relationship for a disk track size of 
4000 bytes. 
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The second relationship involves record-per-cluster-category. This 

relationship is much like the first but a bit more complex. Recall from the section 

TABLE 14 



LOGICAL LAYOUT OF A 
RESPONSE TIME REDUCTION RECORD FILE 

(small database) 

(4000 byte disk track size) 

(300 Mbytes of available disk storage) 


Directory Keywords 


Non-Directory Keywords 


Template 


INTONExx* 


INTTWOxx* 


Multiple" 


StringOOl * 




String— x 


TEMPLG 


1 


1 


One 


Xxxxxxxxx 




Xxxxxxxxx 


TEMPLG 


2 


2 


One 


Xxxxxxxxx 




Xxxxxxxxx 
















TEMPLG 


9372 


9372 


One 


Xxxxxxxxx 




Xxxxxxxxx 


TEMPMEDLG 


1 


1 


One 


Xxxxxxxxx 




Xxxxxxxxx 
















TEMPMEDLG 


18744 


18744 


One 


Xxxxxxxxx 




Xxxxxxxxx 
















TEMPMED 


1 


1 


One 


Xxxxxxxxx 




Xxxxxxxxx 
















TEMPMED 


46,860 


46.860 


One 


Xxxxxxxxx 




Xxxxxxxxx 


TEMPSMALL 


1 


1 


One 


Xxxxxxxxx 




Xxxxxxxxx 
















TEMPSMALL 


93,720 


93.720 


One 


Xxxxxxxxx 




Xxxxxxxxx 
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TABLE 15. THE RECORDS-PER-BLOCK RELATIONSHIP 



Record Sizes 
in Bytes 


Records 
per Block 


2000 


2 


1000 


4 


400 


10 


200 


20 



describing the CAD characteristics the need for nine cluster categories. This 
relationship describes the number of records per cluster category recognizing that 
each cluster category has a different number of blocks. Table 16 depicts this 
relationship. 



TABLE 16. THE NUMBER OF RECORDS PER CLUSTER CATEGORY TABLE 



Blocks 

per 

Cluster: 


Record Size in 


Bytes: 




2000 


1000 


400 


o 

o 

ft 


o 


4 


6 


20 


40 


O 

v/ 


6 


12 


30 


60 


4 


8 


16 


40 


80 


5 


10 


20 


50 


100 


6 


12 


24 


60 


120 


7 


14 


28 


70 


140 


8 


16 


32 


80 


160 


9 


18 


36 


90 


180 


10 


20 


40 


100 


o 

o 

ft 



2. The Generated Test-Database Program Modules 

This section discusses the CAD program structure for the generation of 
the test databases. Recall from the section on Generating the Test-Database Files, 
that three high level modules are invoked to generate the test-database files: 

• Make template_file. 

• Make descriptor files, and 
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• Make template file. 

• Make descriptor files, and 

• Make record files. 

Each module is tasked to generate one of the three types of files germane to the 
test-database component. 

• a. The Generate-Template-File Module 

The generate-template-file module creates one file containing four 
templates, one template for each record class. For each template the module 
writes to the file all the attributes' names: the template name, the descriptor 
names, the name multiple for the multiple attribute, and the strings* names. To 
determine the appropriate number of string attributes, the module accesses the 
number-of-attributes-per-record-size array (see Table 12). 
b. The Generate-Descriptor-Files Module 

The generate-descriptor-files module invokes five subordinate 
modules. Each of these modules is a procedure that is a prerequisite to the 
following modules with the exception of the Backend table module. The five 
subordinate modules are: 

• the Create records per block relationship table module. 

• the Create the record per cluster category table module, 

• the Create the base record block distribution table module. 

• the Backend table module, and 

• the Write descriptor files module. 

We discuss each of these modules in turn. 
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(1) The Record^-Per-Block-Relationship Module. This module 



accesses the disk track size variable and the four record sizes array to calculate the 
relationships. A 2x2 array stores the relationship between the records per block 
and the record size themselves. For each record size, the number of records per 
block is calculated by dividing the disk track size by that record size. 

(2) The Records-Per-Cluster-Category Module. This module accesses 
the records-per-block-relationship array and another array that stores the number 
of blocks per cluster category information to create a records-per-cluster-categorv 
array. For each cluster category, the number of bytes per record size is calculated 
by multiplying the records per block by the number of blocks per cluster. 

(3) The Base Record-Block-Distribution Module. Table 7 depicts the 
logical layout of the base-record-&>block-distribution table (base table). To 
complete this table three data elements must be accessed: the records-per-cluster- 
category-table array, the key number of records variable, and the number of 
backends variable (num be). Columns 1, 2. and 3 of the base table are a 
duplicate copy of the information stored in the largest record size of the records- 
per-cluster-categorv-table array. 

Column 4. the total number of clusters per cluster category, 
involves a number of calculations. The first calculation involves summing the 
entries in column 3 of the base table. This value is the number of records per 
cluster category for all nine categories. The abbreviated variable name 
ttl reds distr has been given to this variable for denoting the total number of 
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records for distribution. This value is then divided into the key number of 



records. If the division is without a remainder, then the total number of clusters 
(cluster size) for each cluster category is the same. Otherwise, the remainder is 
truncated and 1 cluster is added to the cluster size. When a remainder exists, a 
record overflow is incurred by the addition of the single cluster. The record 
overflow is calculated and then divided by the average number of records per 
cluster. This variable is named clus id. This new value identifies the number of 
cluster categories that now must be decremented by a single cluster to insure the 
distribution of all records (key number of records). The cluster categories selected 
to be decremented are evenly distributed across all 9 categories. 

Column 5. the total number of records per cluster category, is 
calculated by multiplying each row entry of column 2 by the corresponding row 
entry in column 3. Column 6, the total number of blocks per cluster category, is 
calculated by multiplying each row entry in column 2 by the corresponding row 
entry in column 3. 

Column 7, the number of blocks per backend, is dynamic in 
nature and is dependent on the number of backends as well as directly associated 
with the configuration number. A subordinate module is invoked to calculate the 
appropriate values for a given case. For each entry in column 7, given the 
configuration, the value is equal to the total numbers of blocks per cluster 
category (column 6) divided by the number of backends in use. 
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(4) The Backend-Table Module. The backend-table module writes a 



report file that depicts the record-dL>block-distribution table for each record class 
(large, medium-large, medium and small) within each database (small, medium, 
large). The base record-^ -lock-distribution table is accessed and those columns 
which are dependent on the record factor, database factor, or both factors are 
multiplied by the appropriate factor with respect the record class and the 
database. 

(5) T he Write-Descriptor-Files Module. The write-descriptor- files 
module writes a descriptor file for each of the three databases. The module 
accesses the template names in the t name array and writes them to the file. The 
module then accesses the total number of records per cluster category column 
(column 5) of the base table to determine the set of values for each of the 8 
descriptor attributes. The module uses the mathmatical formula devised by 
Vincent [refV.plOl] and the database and record factors presented as 
characteristics of the CAD to calculate the values, 
c. The Generate-Record-Files Module 

Each database has its own set of record files. The generate-record-files 
module creates a dynamic number of record files for each database dependent on 
the number of backends. Recall from the discussion of the record file that two 
distinct classes of record files are generated by the CAD system. One set is the rtr 
configuration records which comprises a single file. For each of the rti 



70 



configurations, a separate file is generated and whose size is a multiple of the rtr 
file. The generation of the record files is accomplished as in two steps. 

In the first step, the initial file to be opened is the record file 
representing the response-time-reduction configurations. The size of this particular 
file is the base size for all the multiple size rti record files. For each of the record 
classes the appropriate number of records is written. First, the values for the 
descriptor attributes are written into the record and then the value "One" is 
written for the multiple attribute. For each of the number of string attributes 
(which is different for each record size) the value "Xxxxxxxxx" is stored. The 
total number of records to write is determined by summing all of the entries in 
the total-number-of-records-per-cluster-category column (column 5) and 

multiplying the sum by the record and database factors. Once this evolution has 
been completed for each record class, the file is closed. 

In the second step, before a new file is opened, a copy of the last file 
written is made. Then a new file is opened and appended to the copy. For each of 
the response-time-invariance configurations, the multiple attribute value is 
increased by one. This evolution terminates when the requisite number of record 
files have been written. The entire process encompasses both the rtr file and the 
appropriate number of rti files. The two steps are performed for each database 
(large, medium, and small). 
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C. THE TEST-TRANSACTION-MIX-GENERATION COMPONENT 



The objective of the transaction-mix component is to pass to the TEST 
directory all the transaction-mix files that represent the four record classes. Each 
of the three database sizes has four transaction-mix files, one for each record class. 
The transaction-mix component also generates a number of report files that 
portray information concerning the mixes themselves. 

1. The Generated Test-Transaction-Mix Files 

The CAD system generates three database sets, a large database, a 
medium database, and a small database. For each database three sets of files are 
generated. The first set of four files is the transaction-mix files, one for each record 
class. The second set (four files, one for each record class) is a printout of the 
transaction-mix report file. The third set (four files, one for each record class) is a 
printout of the test-transaction-mix-workload file. The second and third sets are 
used as an integral part of the evaluator's report. As each transaction request is 
generated for a specific record class within a specific database, the transaction-mix 
file, the transaction-mix report file, and the transaction-mix-workload file is 
written concurrently. 

Appendix B includes the test-transaction-mix report and workload for the 
large record class of the small database for a system with 3 backends, a 4000 disk 
track size, and 375 Mbyte disk storage capacity. We encourage the reader to refer 
to the appendix as the remainder of the section is being read. 
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a. 



The Transaction-Mix Files 



Vincent [Ref. 2:p. 104-120] has identified 30 transaction requests that 
encompass the five types of transactions found in MBDS, the retrieval, the 
insert, the update, the delete, and. the retrieve-common. The 30 
transactions are further grouped into 7 request sets, each set designed to test the 
system with overhead-intensive or data-intensive type of operations. The 30 
transactions are created for each record class within a database and comprise a 
separate file. All together, twelve transaction files are written. 

The CAD system presently calculates only the first 24 transactions (6 
request sets) of the 30 suggested by Vincent. These first 24 are the most 
important with regard to the system testing. The remaining 6 transactions (the 
7th request set) are to be implemented at a later date. 

b. The Generated Test-Transaction-Mix-Report Files 

Two types of report files are generated within the test-transaction- 
mix component. The first type of report prints the actual transaction request sets 
with respect to each record class within each database. All together, twelve 
transaction mix files are generated. 

The second type of report files generated in this component are the 
workload statistics for each of the requests in each of the request sets. Again (with 
respect to the record class and the database) twelve workload files are created. 
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2. The Generated-Test-Transaction-Mix-Program Modules 

The Test-Transaction-Mix-Program Modules immediately follow the 
Test-Database Program Modules. Within the test-transaction mix component 
there is only one high-level module, 

• Generate trans mixes. 

Within this high-level module, the subordinate Generate-database-transaction- 
mixes module is invoked for each of the three databases, small, medium, and 
large. 

a. The DB-Test-Transaction-Mix Module 

For each of the four record classes this module opens three files (the 
transaction-mix file and the two report files) and invokes 6 subordinate modules. 
Each of the six subordinate modules generates one of the six request sets. 

b. The Generate-Request-Set Modules 

There are six generate- request-set modules. Each module represents 
one of the five types of transactions. Each transaction request is designed to test 
the system in terms of overhead or data intensive operations. For each request 
within each request set, the template and descriptor names are identified and the 
range values appropriate for that specific request are calculated. Instrumental to 
each calculation are the record and database factors. 

In determining the values for the descriptor attributes, the CAD 
system targets specific cluster categories depending on the nature of the specific 
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request. For example, if the request involved 25% of the database, cluster category 
4 would be the target. Within cluster category 4 the record range value for 25% of 
the database would be found. 

Each request within a given request set is generated by a separate 
subordinate module. Within each of these subordinate modules, the test- 
transaction-mix report and the test-transaction-mix workload files are written. 
The same names and values calculated for the transaction mix are written to the 
test-transaction-mix-report file. For Request sets 1, 2, 3, 5, and 6 the workload of 
the request is described in terms of the number of clusters examined, the volume 
of the database accessed, and the volume of the database transacted. The 
workload for Request 4 is unique and will be described later. 

(1) Generating Request Set 1. Request set 1 is comprised of three 
retrieval requests. Request #1 is overhead-intensive designed to retrieve a very 
small selection of records. Request fl accesses the first cluster category and 
retrieves 1 cluster of records per each backend. Request 42 is data-intensive 
designed to access 2 cluster categories to retrieve a small selection of records. 
Request #3 is data-intensive designed to retrieve 25% of the database records 
while accessing just a fraction more than 25% of the database. 

(2) Generating Request Set 2. Request set 2 is comprised of three 
update requests, all of which are data intensive. Request #4 identifies 1/8 of the 
database and updates STRINGOOl's value from "Xxxxxxxxx" to "Oneeighth". 
Request #5 updates 1/4 of the database and updates STRINGS005's value from 
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"Xxxxxxxxx" to "Onequartr". Request =G identifies 1/2 of the database and 
updates STRINGOlO’s value from "Xxxxxxxxx" to "Onehalf". 

(3) Generating Request Set 3. Request set 3 is comprised of five 
retrieval requests, all of which are data intensive. Request #7 identifies the 
descriptor range value that is exactly 1/8 of the database and retrieves the records 
whose STRINGOOl’s value is "Oneeighth". 

Requests #8, #9, and #10 retrieve portions of the database that 
are 1/8, 1/4, and 1/2 of the database, respectively. They identify the appropriate 
String attributes that were updated in Request Set 2. What they do not do is 
identify the descriptor range value that is exactly 1/8. 1/4. and 1/2 of the 
database, thus every cluster category (100%) of the database is examined! 
Request #11 identifies the descriptor range value for 1/2 of the database whose 
STRINGOlO’s value is "Onehalf’. 

(4) Generating R equest Set 4. Request Set 4 is invoked just 3 times 
because it performs a retrieve-common operation of two record classes, i.e.. of the 
large and medium-large, of the medium-large and medium, and of the medium 
and small. Request set 4 is comprised of three requests. Request #12 is 
overhead-intensive. Request #13 is data-intensive, and Request #13 is both 
overhead and data-intensive. 

Request #12 uses the same values from Request #1 for its first 
retrieve. Its second retrieve value is a multiple of its first. Request #13 retrieves 
one half of the database from each record class. The STRING010 attribute’s value 
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"Onehalf" is the determinant. Request =14 targets l/S of the database for both 
of its retrieve operations. Request #14 uses the same values from Request =7 for 
its first retrieve and makes the appropriate calculations for the second retrieve. 
Because no string attribute is targeted, descriptor attribute values are the 
determinants. 

For each of the retrieve operations (the first retrieve being the 
source request and the second retrieve being the target request), the workload is 
described in terms of the number of clusters examined, the number of records 
accessed, and the number of relevant records to the retrieve-common request. For 
the retrieve-common request as an entire entity, the number of records retrieved 
resulting from the request is also depicted in the workload. All the workload data 
elements are written in table format as a part of the file. 

(5) Generating Request Set 5. Request Set 5 is comprised of two 
insert requests. Request £15 inserts a record into an existing cluster. A record 
from the present record class is duplicated with the exception of the multiple 
attribute. The new value placed in the multiple attribute is one more than the 
maximum possible value with respect to the number of backends. E.g., if a system 
has had 3 backends whose last response-time invariance configuration has a 
multiple attribute value of "Three", the new value would be "Four". 

Request £16 inserts a record into a new cluster. A record from 
the first cluster category is duplicated with the exception of the INTWOxxx 
descriptor attribute (which identifies the number of records per cluster). By 
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determining the maximum range value for this descriptor in this specific cluster 
category, a new value larger than the maximum range value is selected as the 
appropriate value. By defining a new descriptor whose value is not within the 
range limits of those clusters, a new cluster is created for that record to be 
inserted. 

(6) Generating Request Set 6. Request Set 6 is comprised of eight 
delete requests. Request #17 is overhead-intensive. Request #18 is both data and 
overhead-intensive, and Requests #19 through #24 are data-intensive. 

Request #17 and #18 mirror retrieve Request #1 and #2, 
respectively. Request #20 - #24 mirror retrieve Request #7 - ill. respectively. In 
all of these requests, the workload is exactly the same as the mirrored request. 
Request #19 corresponds to Request #3's workload. However the last 25% of the 
database is deleted. Again the target cluster category is identified (category 8) 
and the appropriate value is calculated. 

D. THE REPORT COMPONENT 

The evaluator’s report component provides the user with a detailed 
explanation of the benchmarking process. The report is comprised of standard 
text files interleaved with a number of tables encapsulating data and statistical 
information about the system defined by the user's input. Given the three 
essential elements of information solicited from the user, the CAD system 
produces a report that provides information describing four main topics: the 
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generated test databases, the generated test-transaction mixes, the TEST 
directory and all the files contained therein, and the instructions on how to 
integrate the CAD generated test files with MBDS. 

The reports describing the generated test databases are: 

• the test-configurations report for each of the given databases. 

• the record-block-relationship report for the base table, 

• the records-per-cluster-category-relationship report for the base table, and 

• the record-&-block-distribution-table report for each record class for each 
given database. 

The reports describing the generated test-transaction mixes are: 

• the transaction-mix report for each record class within a given database, and 

• th transaction-mix workload for each record class within a given database. 

Appendix B includes examples of all the generated report files. 

The standard text files are presently being drafted. Once drafted and placed 
within the CAD system, calls are to be coded to interleave the text files with the 
generated report files. The purpose of the text files is two-fold. The first objective 
is to explain exactly the nature of the generated test report files found within the 
TEST directory. The second objective is to explain interfacing the generated test 
files with MBDS. 

Most of the CAD report files are generated concurrently with the test files 
within the other two major components. However, the CAD system does call one 
high-level module dedicated to generating three specific sets of tables, each table 
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being a unique report file. Each table (or file ) depicts the test configurations for a 
given database, small, medium, or large. 

For each database size, the module formats the response-time- reduction 
configurations followed by the response-time-invariance configurations. Several 
data elements must be accessed: the four-record-sizes array (four red sz), the 
database-sizes array (dbase sz), the number of backends (num be), and the 
number of configurations (num config). For the rtr configurations the database 
size (in Mbytes) is accessed from the database-sizes array. Incrementing through 
each configuration number with respect to the number of backends, the number of 
records per backend is calculated as well as the Mbytes per backend for each 
record class. The data is written to a file in table format. 

For the rti configurations the same algorithm applies with the exception of 
increasing the database size (double, triple, etc.) and with respect to the 
configuration number. 
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IV. CONCLUSIONS 



Database management systems have taken three approaches to information 
processing, the traditional mainframe-based approach, the software single-backend 
system, and the software multiple-backend system. The performance and upgrade 
problems identified with the first two approaches are overcome by the third 
approach, the software multiple-backend system, by providing gains through 
specialization of the database operations on dedicated, multiple backends. 

Two goals of the software multi-backend database system are to overcome the 
performance problems and upgrade issues of the traditional mainframe-based and 
the conventional software single-backend database systems. The first goal is to 
produce a reciprocal decrease in the response times of the user transactions by 
increasing the number of backends while the size of the database and the size of 
the responses to the transactions remain constant. The second goal is to produce 
invariant response times for the user transactions by increasing the number of 
backends proportionally to the increase of the transaction responses. The first 
goal allows the multiplicity of the backends of the database system to be directly 
related to the performance gains of the database system in terms of the response- 
time reduction. The second goal enables the multiplicity of the backends of the 
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system to be directly related to the capacity growth of the system in terms of 
response-time invariance. 

To verify the aforementioned performance and growth-capacity claims. 
Vincent [Ref. 2] has formulated a benchmarking methodology for software 
multiple-backend database systems. In this thesis, we have presented a 
computer-aided design (CAD) system for the generation of test databases and 
test-transaction mixes that can be used for the purpose of benchmarking parallel, 
multiple-backend computer systems, specifically the Multiple-Backend Database 
System (MBDS). 

To fully understand the implementation of the CAD system, we have 
reviewed the essence of Vincent's methodology, specifically the test-databases and 
the test-transaction-mix design factors. We have also described the prototyped 
multi-backend database system at the Laboratory for Database Systems Research. 
Naval Postgraduate School. Monterey. California, including a discussion of the 
attribute-based data model and the attribute-based data language. 

The most salient features of the CAD system are two characteristics that are 
an integral part of each of the three major components. The first characteristic 
involves the creation of a number of factors that serve as multipliers for entities. 
Specifically there are two factors, one for the database and one for the record 
classes. The second characteristic involves the creation of a base-record-&>block- 
distribution table whose numeric entries reflect the upper bounds on the number 
of records, blocks, and clusters formed and distributed across the backends of 
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MBDS. The values in the table are used to generate the appropriate number of 
records needed for a precalculated database size. They are also used to determine 
the range values of descriptor files, and they serve to determine the values in the 
transaction mixes as well. 

The CAD system has been designed to receive a minimal amount of 
information concerning a specific database system from the user an to transform 
this input into the requisite test databases and test-transacti> mixes. The 
framework of the CAD system is built around three major compoi. nts, the test- 
database-generation component, the test-transaction-mix-generation component, 
and the evaluator-report component. The final output resulting from the 
combined w T ork of the major components is the CAD system's generation of two 
sets of files placed in a TEST directory for the user. The first set of files is 
expressly for testing MBDS. The second set of files comprises a number of reports 
describing the test databases and the test-transaction mixes. In conjunction with 
the second set of files, the CAD system interleaves a number of standard text files 
that present a narrative for the evaluator providing instructions on how to 
interface the CAD generated test-database and test.-transaction-mix files with 
MDBS. The text files also present a discussion for interpeting and analyzing the 
empirical data calculated by the CAD system. The text files are presently being 
drafted and are to be incorporated into the CAD upon completion. 

The CAD system described in this thesis is a first version. Presently the versin 
generates the test databases and test-transaction-mix files and places them into a 
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TEST directory. The system provides the user with an evaluator's report that 
includes instructions on how to manually load MBDS with the generated test files. 
The second version is to be integrated with MBDS allowing the testing process to 
be controlled and managed by the CAD system. The third version will add 
components to collect statistics (e.g. response times) for the different tests and 
calculate statistics (i.e.. mean and standard deviation of tests, response-time 
reductions and response-time invariances) that measure the performance of 
MBDS. 
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APPENDIX A: THE CAD SYSTEM SPECIFICATIONS 



Task: CAD BenchmarkSystem 



/*The following are global arrays 

t name : an array of the four template names, Tempi g. Tempmedlg, 

/* Tempmed , and Tempsmall 



INTl name: an array of the four descriptor ONE names. INTONELG, * 

/* ‘ INTGNTMEDLG, INTONEVED, and INTONESMALL 

INT 2 name an array of the four descriptor TWO names. I NIT WO LG , * 



/ * 




I NTIWCMEDLG , 


INTlWGVED, and 


INTIWDSM^LL 




r 


Mu 1 1 


i p 1 e j : an array of 


11 


multiple n ame s . "On e 


" , "Two" . . . "Eleven” 


X 


/* 


red 


factor t : an array 


o f 


the record factors. 


1 , 2 . 5 . and 10 


* 


'/* 


dbas 


e factor : an array o 


f the database factors. 1. 2, and 4 


* 


i * 


num 


attr per red sz: 


an 


array of the 


numbe r 


of attributes per 


* 


f * 




record size ( 


class) 






* 


/ * 

/ 


red 


per blk rel tblr.c 


j : 


a 2x2 array 


of the r 


ecord-block relation 




/ * 


red 


per clus cat tbl r. 


c : 


a 9x4 array 


of the 


cluster category and 


* 



/* the record-block relationship 

base _r cd_&_b lock_di str_table 

lcm table: an array of 1 cm' s indexed by the number of backends 



perform Re c e i v e _u s e r _i npu t ( n-um_be , dsk_trk_sz, max_s t or age_dsk_s z ) : 
perform I n i t i a 1 _c a 1 c u 1 a t i ons ( numbe ; dsk_trk sz, max storage sz): 
perform Ma k e _Temp 1 a t e F i 1 e s ( ) ; 

pe r f orm Mak e _De s c r i pt o r _Fi 1 e s ( num be, dsk trk sz.key num reds. 

ma x dsk storage); 

pe r f orm Mak e _Re c ord_Fi 1 e s ( num_be . key numreds); 
perform Generate Trans Mi x F i 1 e s ( num_be ) : 
perform Format test bmark dbsfnum be. num config): 
p e r f o rm Clean Up ( ) ; 
end task; 
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procedure Receive user input ( output num be. dsk t rk_sz . 

max storage dsk sz) 

perform GetSNumber of Backends (num be): 
per form Get SDi sk Track Size (dsk trk sz); 
pe r f orm Ge t SDi s k Storage Size (max storage dsk sz) 
end procedure 
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module Get$ 



procedure Number of Backends ( output: num be) 

'* num be is the number of backends solicited 
j * f r om the user 

end procedure; 



procedure Disk Track Size ( output : dsk trk sz); 

/ * dsk trk sz is the disk track size of the 
/* system solicited from the user 

end procedure ; 



procedure DiskStorageSize ( ou tput: max storage dsk sz): 

f* max storage dsk sz is the maximum size of 
/* the disk storage (in bytes). This value is 
/* inputted by the user 



end pr oc edur e ; 



procedure Initial calculations ( ) 



e nd 



pe r f orm 
pe r f o rm 
perform 
pe r f orm 
pe r f o rm 
perform 
perform 
pe r f orm 
pe r f orm 
procedure 



Cal cu 1 at eSConfigur at i on? ( num be. num config): 

Ca 1 c u 1 a t e § 1 cm( num be. 1cm); 

Ca 1 c u 1 a t e SFour Record Sizes (dst irk sz); 

Ca 1 c u 1 a t e SAv a i 1 Disk Storage (max storage dsk sz); 
Ca 1 c u 1 a t e SDa t abas e Mu 1 t i p 1 e ( 1 cm) : 

Cal cu 1 at e $Dat abas e _S i z e s ( dbm. avail_dsk storage); 
Calculate SC ale _num_a t tr_for_ea_rcd_sz ( ) ; 

Ca 1 c u 1 a t e SNum attr for ea red _ s 2 ( ) ; 

Calculates Key n um r c d s ( ) : 
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module Calculated 



function Configurations ( 1 npu t : num be . 

output: num config), 

num config = 2*(num be) - 1: 

/* num_config is the number of test configurations 
/* for each of the databases, small, medium, and 
/* large 

end procedure: 



function lcm ( i npu t : num_be , lcm_info ptr) 

output: 1 cm) ; 

Access table and select the corresponding 1cm value 
for the respective num_be 

/* 1cm is the least corrmon multiple based on the 
/* number of backends, num_be 

end procedure: 



procedure Four RecordSizes ( i npu t : dsk trksz, rcd_f actor 

output: four red szs lrs, ml r s , mr s , srs ) 

Determine the four record sizes based on Strawser ‘ s 
s c heme 

four red sz 0 =dsk trk s z 2 : 

for every other four red sz i j . i ranging 1 to 3 do 

four red szfi = four red sz 0 red factorji : 

end for 

/* the four record sizes are stored in a global array * 
/ * four red sz[], lrs is large record size, 

* ml r s is the medium-large record size. 

/ * mr s is the me d i um record size 
/* srs is the small record size 
/* rcdfactorl! is a global array 

end procedure; 
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function Avail Disk Storage ( i^nput max storage dsk sz. 

output avail dsk sz); 

avail_dsk_sz = 80% of max_s t or ag e _ds k sz 

/* avail dsk _sz is that portion of the maximum disk storage 
/* that is reserved for the database per backend 

end pr oc edure ; 



function Database Multiple ( input : 1 cm. 

output : dbm) ; 

lg red sz = four red sz 0 ; 

d bm = (1cm x 32 x lg rcd_sz); 

* d bm is the database mu 1 t i p 1 e * 

end procedure; 



procedure Database Sizes ( input:, dbm, avail dsk sz, 

output: dbase sz lg db sz, 

med db sz, small db_szj); 

t* The following variables are local */ 
folds 
1 g_db_s z 

folds = avail dsk sz dbm; /* discard the remainder * 

dbase ^sz 0 = (dbm) x folds; 

1 g db s z =dbas e s z 0 : 

dbase_sz 1 = 1/2 of lg db_sz; 

dbase sz 2 i= 1/4 of lg db_sz; 

* dbase sz is a global array 

* lg db sz is the large database size,, db_sz 0l 

* med db sz is the medium database size, db_sz 1 

* small db sz is the small database size. db_sz 2 

end procedure: 
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procedure Calc mini attr for ea red sz( output num attr per red sz ) 

Determine a common divisor to all four record sizes. 

Divide each of the record sizes by the divisor. 

The number 10 may be satisfactory. 

/* num attr_per red sz is a data construct containing 
/ * the respective n umb er of attributes for each of the 
/* record sizes: large, med-lg, medium, small 

end procedure ; 



function Calc key num reds ( output: key num reds); 

key num reds = (dbase s z I sma 1 1 db sz num red classes) 

f our _r cd sz lg redsz ); 

* dbase s z I j is aglobal array * / 

/ * n um red classes is a global constant * / 

* four_rcd_sz[i is a global array */ 
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procedure Make Discriptor files ( input num be. dsk trk sz. 

key num reds max dsk storage, 
o u t p u t_ three descriptor files): 



perform Initialize _c 1 us_c at t abl e ( ) : 

perform Create_record_perblock_relationship_table(dsk trksz) ; 
perform Create red per cluster cat table(): 

perform Create _base r e c o r d_b 1 oc k_d i s t r i bu t i on _t a b 1 e ( n um_be , key_num_rc< 
perform Bac kend t ab 1 e ( num_be ) ; 

perform Wr i te_descriptor_files(n um_b e . k e v _n um_r c d s ) ; 
end procedure; 
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procedure Create records per block relationship table ( i n pu t dsk trksz. 

output red per clus cat table r.c ); 

* The following code creates the Records per Block Relationship Table * 

* which is a global array 

/ * The following variables are local */ 

/* block: the disk track size for memory transfer */ 

block = dsk t r k_s z 

for each record size, f ou r _r c _s z [ i ] do 
red per blk = block /red s z [ i 
write rcd_szij, rcd_per_blk 

end f or ; 



end procedure ; 



procedure Create red per cluster catagory table ( out put : 

rcd_per_clus cat tablejr.c ): 

access block per cluster information ( CCI array) 

and the record-block relationship array which are global 

scalar multiply the records per block relationship table x (CCI array 
and store in rcd_per clus cat tbl array 

write the r/cc table to a file: 

end procedure; 
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procedure Create base record &: block dist table ( input : num be 

key_num_r cds . 

outp ut : bas e r cd_&_bl k_d 1 s t r table r.c ): 

/* base_rcd &:_blk distr_table r*, c is a global structure 
(col 0) Re cord sz in byte column array 

(col 1 )Numbe r o f _b 1 ock s _pe r _c 1 us t e r column array * 

(col 2) Number of records_per_c luster column array * 

(col 3)Total number of clusters column array 
(col 4)Total number_of _r ecords_co 1 umn array 
(col 5 ) To t a 1 ^number _o f _b 1 oc k s _c o 1 umn_ar ray 

(col 6)A structure to another (dynamic) table dependent */ 
on "m" configurations */ 

perform Comp 1 e t e_c o 1 umn _0 ( bas e _r c d_&_b 1 k _d i s t r _t ab 1 e r.c] ) ; 

p e r f o rm Comp 1 e t e col umn s 1 & 2(base red Sc blk distr table r , c 1 ) ; 

perform Comp 1 e t e_c o 1 umn_3 (base r cd JL b lk_distr_table r,c]); 

per form Comp 1 e t e _c o 1 umns _4(base red _<^_b 1 k _d i s t r t ab 1 e r , c ) ; 

perform Comp 1 e t e _c o 1 umn s _5 ( bas e _ r c d _&_b 1 k distr table r,c ); 

pe r f orm Comp 1 e t e col umn 6(base red & blk distr t able r,c]); 

end pr oc edur e ; 



/* 
/ * 
/ * 
/* 
/* 
/* 
r 

f * 
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procedure Complete columns 1 & 2() 

access red blk distri table r,c 

copy rcc table r , c (large record size size only) to columns 2 & 3 in 
rc d_&_bl k_d i s t r i _t ab 1 e r,cj; 

/* writes r/cc table to bas e_rcd_&_bl k_t abl e */ 
end procedure; 
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procedure Complete column 3( input key nurn red?. 

output : base_rcd_<L: blk distr table r.c ): 

* Comp 1 e t e s col umn 3 of the base table 

* Performs the cluster distribution for the nine cluster catagories * 

/ * The following are local variables: 

r is the total records per cluster suirmed over 
luster categories * / 

ifies which cluster categories are to be decre- * / 

1 cluster to ensure proper distribution */ 

a flag indicating whether or not all 9 cluster */ 

are to receive the exact s ame n umb er of * / 

* 

is the n umb er of clusters per cluster category * / 
t t 1 _r cds _d i s t r = Ca 1 c u 1 a t e _t r d ( ) ; 



/* 
/ * 
/* 
/ * 
/* 
/* 
/ * 

/ * 



ttl reds dist 
all nine c 
case_id ident 
men ted by 
equal _d i s t is 
categories 
clusters 
cluster size 



cluster _size = Calculate _c luster size (key num reds. ttl rcds_dist 

case_id. equal dist); 

access the column 3 of the red &: blk distri table , 

" total number of clusters 1 ' column 



base_rcd_& blk distr_table[r,3 

for each cluster catagory in the 
base red & blk distr table r,3 



above column, r ranging from 1 t 
= cluster size: 



o 9 do 
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if equal _d 1 s i is false 

access the num of reds per cluster column (column 2) 
in the red 1 k _d i s t r i _t ab 1 e r . 2 

for case i d ( i ) 



i = 


1 


subtract 


1 


f r om 


cluster 


catagory 5, 


1 t 0 


9 catagor i es 


i = 


2 


subtract 


1 


f r om 


cluster 


catagor 


i e s 


1 and 


9 






i = 


9 

O 


subtract 


1 


f r om 


cluster 


catagor 


i e s 


1 . 5 and 9 






i = 


4 


subtract 


1 


f r om 


cluster 


cat agor 


i e s 


1,2.8 


and 9 






i = 


5 


subtract 


1 


f r om 


cluster 


catagor 


i e s 


1.2,5, 


, 8 and 


9 




i = 


6 


subtract 


1 


f r om 


cluster 


c at agor 


i e s 


1,2.3 


,7,8 and 


9 


i = 


7 


subtract 


1 


from 


cluster 


cat agor 


i e s 


1,2,3, 


,5,7.8 


and 9 


i = 


8 


subtract 


1 


f rom 


cluster 


cat agor 


i e s 


1,2,3, 


,4.6,7, 


,8 


and 9 



end case 

end proc edure 
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function Calculate trrd () 



s urn the total n umb er of records in the large rcord size col umn 
of the red _per_c lus _cat_table(rcc table); this value is the 
t t 1 _r cd s _d i s t (total records for distribution) 

/* use for loop, index from 1 to 9 and sum row entries assigning * 
/* sum total to 1 1 1 _r c d s _d i s t * / 

end procedure: 

function Calculate cluster size ( i npu t : key_num reds , ttl_rcds_dist. 

case__id. equal d i s t , 

output : cluster size. 

equal distribution, case id): 



/ * Th e foil o w ing are local variables * j 

/* red overflow is the result of unequal distribution *f 

j * record deficit is the n umb er of records short for * f 

even cluster distribution * / 

avg_num_rcds_per_c luster */ 

initialize equal _distribution to false; 
num clus cat =9: 

cluster _size = ( key _num_r c d s / ttl rcds_dist); 

red overflow = key num reds - (cluster size x ttl reds dist); 

if (red overfl o w = 0 ) 

equal distribution = true; 



else 



cluster size = cluster size — 
equal distribution = false: 
avg num reds per cluster = tt 
record deficit = ttl reds dist 
case id = record deficit avg 



1 ; 

1 reds dist / n um clus cat; 
• red overflow; 
num reds per cluster; 



* red overflow is the record overflow if the number of 

records cannot be evenly distributed into an equal number of *, 
clusters considering the original cluster^size calculation 

’"avg num reds per cluster is the average number of records per cluster 
*num clus cat is the number of cluster catagor ies (global constant) 
*case id is the case identifier which determines which cluster catagorie 

* will be adjusted in size. i.e. decremented by a single cluster 



end procedure; 
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procedure Complete col 4 () 

access red blk distri table; 

to calculate the value for each row entry in col umn 4 
multiply the corresponding row entry from column 2 by 
the corresponding row entry from column 3; 

end procedure ; 



procedure Complete col_5() 

access r cd_&_b 1 k _d i s t r i _t ab 1 e ; 

to calculate the value for each row entry in column 
multiply the corresponding row entry from column 1 by 
the corresponding row entry from column 3; 

end procedure: 
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o* ■ 



procedure Complete col 6(num be) 

* The following variables are local 

x num_r t r _c onfig s is the number of rtr configurations */ 

* 1 1 1 n um blks is the r ow entry, total n umb er of blocks * 

* extra blocks is the number of blocks remaining follow-*/ 

ing even distribution of total number of blocks 
/* across the backends 

* block distribution is the block distribution per * / 

backend */ 

access red & bik distri tablejr,6 : 
n um rtr_configs = n um be; 
do the 1st configuration 

copy coi_5 to col_6 
if m > 1 do 

for configurations(i) ranging fr om 2 ton um rtr configs do 
create i col umn s 

for each row entry , r do r ranging from 1 to 9 ■ 

* 9 t i me s, once for each cluster cat agory * ! 

read t t I _num of blks (row entry) of r cd_&r_b 1 k_d i s t r i table 

extra blocks = t t 1 n um_o f b Iks \DD i ; 

block distribution =■ ttl num blks / i; 

for ea be column entry, c do c ranging from 1 to i 

be col entry r.c = block _distri but ion; 

end for 
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if extra blocks = 0 

* do nothing 



else 



for x = 1 to extra blocks 

be^column r,y = be_column r , y j 
y = y 1 ; 
if y = i 



y = i ; 

X = X -r 1 : 

end for 



end i f 
end for 



end for 
end procedure 
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procedure Write discript or files ( output: 3 descriptor files): 

* t name is a global array containing the four template names 

* I NT 1 name is a global array containing the four IlNTxxl discriptor names 

* INT 2_name is a global array containing the four INTxx2 discriptor names 

* red factor is a global array containing the four record factors 

* dbase factor is a global array containing the three database factors 

for each dbase_factor db . db ranging from 1 to 3 do 

write to a file 

da t abas e __i d_n ame = TEST 
TEMP c s" 

for t name [ j j j ranging from 1 to 4 

write the character ! , and t name [j] 
end for 

the character @ 



access b as e _r c d _&_b 1 k d i s t r t ab 1 e [ r , c 
for INT 1 name i , , i ranging from 1 to 4 do 
write the INT 1 name [ i .a, i 
* '’a” is the type of attribute, i stands for integer * 

first value = 1; 
second value = 0: 

factor = rcdfactor i * dbase_factor db 
for base red blk table r , 4 , r ranging fr om 1 t o 9 d o 
temp value = base red blk distr table r,4 * factor 
second value = second value temp value 
write first value, second value 
first value = second value — 1 
end for 

write the character (Q 

end for 
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for INT 2 name 1 i ranging from 1 to 4 do 
write the INT 2 name i . a. i 

/* n a u is the type of attribute, i stands for integer * 
factor = red factorji, * dbase factorfdb 
w = 0 

for each entry in bas e _r c d _&_b 1 k _d i s t r i _t ab 1 e r,2 r ranging 
f r om 1 t o 9 d o 

x = base_rcd_& blk distri table r , 2 * red f actor i 

tnc = base red & blk distri table r,3 * db factor db 
for each value y, y ranging from 1 to tnc do 
first _value = w -t- (x*y) - (x - 1) 

second_value = w ^ (x*y) 
write first _value, second value 

end for 
w = w x * y 
end for 

write the character @ 

end for 

write the character $ 

send file to directory placing routine 
end for (dbase factor loop) 
end procedure 
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procedure Write record files( input: num be. key num reds. 

ou tpu t : record files): 

The following are global data elements: 

* num at tr per red sz is agiobal array containing */ 

the number of attributes per record class 
/ * key num reds is a global variable; the number of records from 

the large record class from the small database set. */ 

/ * numbe is the number of backends * / 

/* t name is a global array of the template names */ 

/* The following are local data e-iements: */ 

/* max_s t r 1 ng_v a 1 ue is the max number of string attributes for a given */ 

recordsize * / 

/ * str n um is a loop index me a n i n g n umb er of string attributes * / 

for each dbase factor db db ranging fr om 1 to 3 do 

for each mu ltiplei. i ranging fr om 1 ton um_b e do 

write to a file 

database id name = TEST 
the character @ 

for t^name j j ranging from 1 to 4 do 

factor = rcd_f actor j * db_factor db 
for value = 1 to key n um reds * factor do 



write t n ame [ j j 

write value, value, mu 1 t i p 1 e i ] , 

max_str ing value = num_at t r_per_rcd_sz j - 4 

for str n um =1 to max string value do 

write Xxxxxxxxx 



end for 



end for 

write the character © 



end for 

l f mu ltiple i > 1 then 

get previous file written: append current file 

to previous file 

send file to directory routine 



end pr oc edu r e 
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perform Generate trans mixes) input . num be. 

output^ 12 transaction mix files): 

Twelve transaction mix files are generated, one per record class * 
foreachofthethreedatabases * / 

/* Twelve transaction mix report files are generated, one per */ 

/* record class for each of the three databases * / 

/* Twelve transaction mix workload files are generated, one per 
/ * record class for each of the three databases * / 

For each of the three databases, db do 

Perform Gen_db trans mixes)) 



end procedure; 



Procedure Gen db_trans_mixes() 

for each of the four record classes, rs do 

open a file for the transaction mixes; 

open a file for the transaction mixes report: 

open a file for the transaction mixes workload: 

factor = dbase factor'db’ * red factor rs : 

perform Gen reqset 1 ( ) ; 
perform Gen reqset2(): 
perform Gen reqset3(); 
perform Genreqset4( ) : 
p e r f o rri Gen reqsetS) ) : 
perform Gen reqset6(): 

close all files 

end for 1 oop 

end procedure; 
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Procedure Gen reqsetl() 

print table headings for the report file and workload file 

p e r f o rm Re q u e s t 1 ( ) : 
perform Request2(): 
p e r f o rm Re q u e s t 3 ( ) ; 

end procedure; 



RequestlQ 

OBJECTIVE: Retrieve 

Assign the a-ppropriate template and descriptor names. 

Given the number of backends determine the minimum number of 
records to be retrieved such that each backend provides the 
s ame n umb er of records. Th e target cluster category is 1. 

The records to be retrieved should be taken from the middle 
of the cluster category. 

The values selected are for the INIGNExx descriptor. 



For the work load: 



Calculate the number 


o f 


records 


accessed by 


counting all 


the 


records in the first 


category. 












Calculate the number 


o f 


records 


r e t r i 


i eved by 


mu 1 t 


i ply i ng 


the 


number of backends 


by 


the n umb e 


r of 


records 


pe r 


cluster 


for 


cluster category 1. 


Remember on 


ly i 


cluster 


pe r 


backend 


i s 



retrieved, 
end procedure; 



Reqnest2() 

OBJ ECT I VE : Retrieve 
Request? has been stubbed, 
end pr oc edu r e ; 



Reqnest3() 

OBJECTIVE: Retrieve 

Assign the appropriate template and descriptor names. 

Determine 25% of the database, and calculate the INTIWOxx descriptor 
value . 
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For the work load 

Determine which cluster cat egor\ earmarks the INTWOXX value 
culate the volume of the database accessed by counting all 
in that cluster plus all preceding clusters then divide by 
total n umb er of records in all 9 categories. 

The volume of the database retrieved is 25%. 



and cal * 
the record 
the 



107 



Procedure Gen reqset2() 

print table headings for the report file and workload file 

perform Request4() : 
per form Requ e s t 5 ( ) : 
perform Request6(); 

end procedure ; 



Request 4 () 

OBJECTIVE: Update 

Assign the appropriate template and descriptor names. 

Calculate 1/8 of the given database and assign that value to 
the IXIVVOxx descriptor attribute. 

For the workload: 

Determine 1/8 of the database. Determine which cluster 
category earmarks 1/8 of the database and calculate 
the volume of the database accessed by counting all the records 
in that cluster plus all preceding clusters, then divide that 
sum by the total number of records in all categories. 

The volume of the database updated is 12.50%. 

end proc edur e : 



Request5() 

OB JECT I VE : Update 

Assign the appropriate template and descriptor names. 

Calculate 1 4 of the given database and assign that value to 
the IMWOxx descriptor attribute. 

For the wo r k 1 oad : 

Determine 1/8 of the database. Determine which cluster 
category earmarks 1 4 of the database and calculate 
the volume of the database accessed by counting all the records 
in that cluster plus all preceding clusters, then divide that 
sum by the total number of records in all categories. 

The volume of the database updated is 25.00%. 

end procedure; 



108 



Request6() 



OBJECTIVE: Update 

Assign the appropriate template and descriptor names. 

Calculate 1/2 of the given database and assign that value to 
the IIVIWOxx descriptor attribute. 

For the wo r k 1 oad : 

Determine 1/2 of the database. Determine which cluster 
category earmarks 1/2 of the database and calculate 
the volume of the database accessed by counting all the records 
in that cluster plus all following clusters, then divide that 
sum by the total number of records in all categories. 

The volume of the database updated is 50.00%. 

end procedure: 
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Procedure Gen reqset3() 

print table headings for the report file and workload file 

perform Request7(): 
p e r f o rm Re q u e s t 8 ( ) : 
perform Request9(); 
perform Request 1 0 ( ) ; 
perform Requestll(): 

end procedure; 



RequestTQ 

OBJECTIVE: Retrieve 

Half of the database is to be accessed and 1/8 is to be retrieved. 

Assign the appropriate template and descriptor names. 

Calculate W2 of the given database and assign that value to 
the INTONExx descriptor attribute. 

Assign STRING001 attribute the value Oneeighth. 

For the workload: 

Determine 1 2 of the database. Determine which cluster 
category earmarks 12 of the database and calculate 
the volume of the database accessed by counting all the records 
in that category plus all preceding categories, then divide that 
sum by the total number of records in all categories. 

Th e v o 1 ume of the database retrieved is 12.5 0^ . 

end procedure; 



Request8() 



OBJECTIVE: Retrieve 



A 1 1 


of 


t h e 


da t abase i s 


to be accessed 


and 1 ' 8 is 


As s i 


i gn 


t he 


appr opr i at e 


t emp late n ame . 




As s i 


i gn 


STRING001 


at t r i 


ibute the value 


On e e i g h t h . 


Fo r 


the 


work load: 








The 


vo 1 


ume 


of the 


database accessed i 


s 100.00^. 


The 


v o 1 


ume 


of the 


d at abas e retrieved 


is 12.5 0 % 



to be retrieved. 



end procedure; 
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Request9( ) 



OBJ EXIT I VE: Retrieve 

All of the database is to- be accessed and 1/4 is to be retrieved. 



Assign the appropriate template name. 

Assign STRING005 attribute the value Onequartr. 



For the workload: 

The volume of the database 
The volume of the database 



accessed is 100.00%. 
retrieved is 25.0 0% . 



end pr oc edur e : 

RequestlOQ 

OBJEXITIVE: Retrieve 

All of the database is to be accessed and 1/2 is to be retrieved. 

Assign the appropriate template name. 

Assign STRING010 attribute the value Onehalf. 

For the workload: 

Th e v o 1 ume of the database accessed is 100.0 0%. . 

The volume of the database retrieved is 50.00%. 



end procedure: 

RequestllQ 

OBJEXITIVE: Retrieve 



Half of the database is to be accessed and 1/2 is to be retrieved. 

Assign the appropriate template and descriptor names. 

Calculate 1/2 of the given database and assign that value plus 1 to 
the I NTIVVXx x descriptor attribute. 

Assign STRING010 attribute the value Onehalf. 

For the workload: 

Determine 1 2 of the database. Determine which cluster 
category earmarks the INTV\OXX value of the database and calculate 
the volume of the database accessed by counting all the records 
in that category plus all following categories, then divide that 
sum by the total number of records in all categories. 

The volume of the database retrieved is 50.00%. 

end procedure; 
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Procedure Gen reqset4() 

print table headings for the report file and workload file 

perform Reque s t 1 2 ( ) : 
perform Request 1 3 ( ) : 
pe r f orm Requestl4( ) : 

Note that the workload for Request4 involves the following: 



Number 


o f 


c lusters 


ex ami ned 


by 


t he 


first ( 


source) retrieve 


Number 


o f 


records 


accessed 


by 


t he 


source 


r e t r i 


i eve 


Numb e r 


o f 


records 


relevant 


t o 


the 


source 


ret r i 


i eve 


Number 


o f 


clusters 


ex ami ned 


by 


the 


sec ond 


(target) retrieve 


Number 


o f 


records 


accessed 


by 


t he 


target 


retr i 


i eve 


Numbe r 


o f 


records 


relevant 


t o 


t he 


target 


retri 


i eve 



Size of the resulting record set (in n umb er of records) 



end procedure; 



Requestl2() 

OBJ ECT I VT : Retrieve- Corrmo n 

To access a small selection of records in the first cluster 
category of back to back record classes and return those 
records that share corrmo n INTTQNExx attributes within each 
respective record class. 

Assign the appropriate template and descriptor names. 



The descriptor values from Requestl are used for the first retrieve 
and the upper range value is multiplied by the record factor 
as the value for the second retrieve. 



For the wo r k 1 oad : 

The workload for the first retrieve is 

The workload for the second retrieve i 
the number of clusters identified in c 
determining the appropriate number of 
category . 

Th e result record set is the s ame as t 
by the source retrieve. 



the s ame as Requestl. 
s calculated by determining 
luster category 1 and 
records accessed in that 

he number of records relevant 



end procedure. 
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Request 13 () 



OBJECTIVE: Re t r i e v e - Corrmon 

For each retrieve, access all the records fr om the record class 
such that only 1/2 the records are relevant to that retrieve. 

Retrieve those records that share corrmon INTTONExx attributes 
within each respective record class. 

Assign the appropriate template values. 

Assign the value "Onehalf" to the STRING010 attribute. 

For the work load: 

All cluster categories are examined by both the source and 
target retrieve. Make the appropriate assignments. 

All records are accessed by both the source and 
target retrieve. Make the appropriate assignments. 

The number of records relevant is half of the total number of 
records for each given record class. 

The result record set is the s ame as the n umb er of records relevant 
by the source retrieve. 

end pr oc edur e . 
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Requestl4() 



OBJECTIVE: Re t r i eve -Corrmon 



For the source retrieve, access all the records fr om the record 
class such that only 1/2 the records are relevant to that retrieve. 

For the target retrieve, access 1 / 1 6 the records of the second 
record class. 

Retrieve those records that share corrmon INTlONExx attributes 
within each respective record class. 

Assign the appropriate template and descriptor names. 

Determine half the number of records for the first retrieve and 
assign this value to the IOTONExx descriptor attribute for 
the first retrieve and also as the upper range value INTONExx 
descriptor in the second retrieve. 

Determine 1/8 of the number of records for the first retrieve and 
subtract this value from the (half the number of records) value. 
This new value is the lower range value for the IIVTOVExx 
descriptor attribute of the target retrieve. 



For the wo r k 1 o ad : 

For the first retrieve: 

Identify the target cluster category and s um up all the clus- 
ters in this and all preceding categories for the total number 
of clusters examined. Do the same for all the records accessed. 
The number of records relevant to the source is exactly 1/2 the 
total n umb er of records. 

For the second retrieve: 

Identify the target cluster category. The total n umb er of 
clusters in this category is the number of clusters examined. 

Do the same for the number of records accessed by the target 
request. The number of records relevant to the target request 
is the difference between its INTONExx descriptor values. 

The result record set is the same as the number of records 
relevant by the target retrieve. 

end pr oc edur e : 
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Procedure Gen_reqset5() 



print table headings for the report file and workload file 

perform Requestl5() : 
perform Requestl6() : 

end procedure; 



Request 15 () 

OBJECTIVE: Insert (a record into an existing cluster) 

Assign the appropriate template and descriptor names. 

Determine the number of backends in the system: the value for 
the multiple attribute is one greater than the number of backends. 
Access the number of attributes per record class and write the 
appropriate number of string values, Xxxxxxxxx . 

For the work 1 oad : 

No clusters are examined and the database is not accessed. 

One record is inserted. 

end procedure; 



Requestl6() 

OBJECTIVE: Insert (a record into a new cluster) 

Assign the appropriate template and descriptor names. 

To determine the new value for the descriptor attribute: 

Determine the number of records in the first cluster category. 

Mu ltiply this value by 100. then divide by the n umb er of clusters 
in the first cluster category. The result is the new value for 
the descriptor attribute. 

The multiple attribute’s value is ’’One 11 

Access the number of attributes per record class and write the 
appropriate number of string values, Xxxxxxxxx. 

For the work load: 

No clusters are examined and the database is not accessed. 

One record is inserted. 

end procedure; 
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Procedure Gen reqset6() 



print table headings for the report file and workload file 

perform Reque s t 1 7 ( ) ; 
p e r f o rm Requestl8( ) ; 
perform Requestl9() ; 
perform Request20(-; 
perform Request2l(); 
perform Request22(); 
p e r f o rm Request23( ) ; 
perform Request24(); 



end procedure; 



Request 17 () 

OBJECTIVE: Delete 

Assign the appropriate template and descriptor names. 

The descriptor attribute values are the same as Request 1. 
For the wor k 1 oad : 

The workload is the same as request 1. 
end procedure; 



Request 18 () 

OBJECTIVE: Delete 

Request 18 has been stubbed, 
end procedure; 



Request 19() 

OBJECTIVE: Delete 

Assign the appropriate template and descriptor names. 

The descriptor attribute value is calculated as follows: 
Subtract 2 5% of the total records fr om the total records value. 
The result is the descriptor value. 

For the wor k 1 oad : 

To determine the volume of the database deleted: 

S urn all the records between the descriptor attribute value 
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and the total records Divide this sum by the total records 
value. 

To determine the volume of the database accessed: 

Identify the cluster that stores the target descriptor value. 

Sum all the records in this cluster plus all the records in those 
clusters that follow up through the last cluster category. Divide 
the sum by the total number of records. 

end pr oc edur e ; 



Request20() 

OBJECTIVE: Delete 

Assign the appropriate template and descriptor names. 

The descriptor attribute values are the same as Request 7. 
The string assignment is the same as Request 7. 

For the workload: 

The workload is the same as request 7. 
end procedure: 



Request2l() 

OBJECTIVE: Delete 

Assign the appropriate template name. 

The string assignment is the same as Request 8. 
For the workload: 

The workload is the same as request 8. 
end pr oc edur e ; 

Request22() 

OBJECTIVE: Delete 

Assign the appropriate template name. 

Th e string assig nme n t is the s ame as Request 9 
For the wor k 1 o ad : 

The workload is the same as request 9. 
end pr oc edur e ; 
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Request23() 



OBJECTIVE: Delete 

Assign the appropriate template name 

The string assignment is the same as Request 10. 

For the workload: 

The workload is the same as request 10. 
end procedure: 



Request24() 

OBJECTIVE: Delete 

Assign the appropriate template and descriptor names. 

The descriptor attribute value is the same as Request 11. 
The string assignment is the same as Request 11. 

For the wo r k 1 o ad : 

The workload is the same as request 11 
end pr oc edur e : 
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procedure Format test benchmark dbs ( input: num be. num config. 

output: lg d b set. me d d b set. 

sma 11 db set): 

* This procedure generates the test benchmark databases 
/ * for the repor t * / 

for each database size, 

write a separate database set file (large, med. small) such that 
db sz ij, i ranging fr om 0 to 2 do 

/* The following code will generate the response time */ 

/* reduction configurations * / 

for b ends in use = 1 to num be do 
config num = num_be 

write c on fig _num 

mbytes per be = (db sz i J / 4 ) / num be 

/* dividing the db s i z e [ i by 4 distributes */ 

/* the database evenly to each of the record*/ 

* classes * / 

write b ends_in use 

for each red sz'pj, p ranging from 0 to 3 do 

numred per be ^mbytes per be / red sz p 
write re d s z j p . , 

write num red per be, mbytes per be 

e nd do 

write db_s z [ i ] 



end do 
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Th e foil ow ing code will generate the response t ime 
invariance configurations 



config num = config num — 1: 
b ends in use = 2; 

while b ends in use is less than or equal to num be do 
write config num, b_e nd s i n _u s e 
mby t e s _pe r _be = db_szji]/4 

db sz_in use = (mbytes per be) * (b ends in use) 

for each rcdsz p; , p ranging from 0 to 3 do 

•num_r c d_pe r _be =mbytes per be / red sz p 

write rcd_sz p], n um_r c d _pe r _be , 
write mbytes per be 



end do 

write db_sz_in_use 

b ends in use = b ends in use + 1 
config num = config num — 1 
end while 

end for; 
end proc edur e : 

b ends in use is the number of backends used for a specific configuration.* 
con fig num is a specific configuration (number). 

* red sz is one of four record sizes fr om four red sz 

* num_r cd_pe r be is the number of records per backend (per record class) * 
/* db_sz_in use is the database size created for the invariance configur- 

* a t i o n s 

/* mbvtes_per be is Mbytes per backend (per record class). 
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APPENDIX B: CAD GENERATED REPORTS 



The reports contained in this appendix comprise only those generated for 
the small database set. The CAD system generates similar reports for the large 
and medium database sets as well. The reports are based upon a system defined 
to have the following: 



maximum disk storage: 


375 


Megabytes 


available disk storage: 


300 


Megabytes 


disk track size: 
number of backends: 


4000 

O 

o 


bytes 



The first set of reports are generated by the Test-Database Component. The 
second set of reports are generated by the Test-Transaction Mix Component. 
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A. TEST-DATABASE COMPONENT REPORT 



THE RECORDS-PER-BLOCK RELATIONSHIP 



Record Sizes 
in Bytes 


Records 
per Block 


2000 


2 


1000 


4 


400 


10 


200 


20 



THE NUMBER OF RECORDS PER CLUSTER CATEGORY TABLE 



Blocks 

per 

Cluster: 


Record Size in Bytes: 


2000 


1000 


400 


200 


2 


4 


8 


20 


40 


9 

O 


6 


12 


30 


60 


4 


8 


16 


40 


80 


5 


10 


20 


50 


100 


6 


12 


24 


60 


120 


7 


14 


28 


70 


140 


8 


16 


32 


80 


160 


9 


18 


36 


90 


180 


10 


20 


40 


100 


200 
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SMALL DB TEST CONFIGURATION? 



Configuration 


Number 


Record 


Number of 


My bytes 


Database 


Number 


of 


Size in 


Records per 


per 


Size in 




Backends 


Bytes 


Backend 


Backend 


Mbytes 


1 


1 


2000 


9372 


18744000 








1000 


18744 


18744000 








400 


46860 


18744000 








200 


93720 


18744000 


74.976 


2 


2 


2000 


4686 


9372000 








1000 


9372 


9372000 








400 


23430 


9372000 








200 


46860 


9372000 


74.976 


3 


o 

o 


2000 


3124 


6248000 








1000 


6248 


6248000 








400 


15620 


6248000 








200 


31240 


6248000 


74.976 


4 


2 


2000 


9372 


18744000 








1000 


18744 


18744000 








400 


46860 


18744000 








200 


93720 


18744000 


149.952 


5 


9 


2000 


9372 


18744000 








1000 


18744 


18744000 








400 


46860 


18744000 








200 


93720 


18744000 


224.928 
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RECORD BLOCK DISTRIBUTION TABLE 
SMALL DATABASE 
THE NUMBER OF BACKENDS: 3 



Record 


Number of 


Number of 


Total 


Total 


Total 


Size 


Blocks 


Records 


Number 


Number 


Number 


in 


per 


per 


of 


of 


of 


Bytes 


Cluster 


Cluster 


Clusters 


Records 


Blocks 


2000 


o 


4 


86 


344 


172 ' 




O 


6 


87 


522 


261 




4 


8 


87 


696 


348 




5 


10 


87 


870 


435 




6 


12 


87 


1044 


522 




7 


14 


87 


1218 


609 




8 


16 


87 


1392 


696 




9 


18 


87 


1566 


783 




10 


20 


86 


1720 


860 


1000 


o 


8 


86 


688 


172 




•> 


12 


87 


1044 


261 




4 


16 


87 


1392 


348 




5 


20 


87 


1740 


435 




6 


24 


87 


2088 


522 




t 


28 


87 


2436 


609 




8 


on 


87 


2784 


696 




9 


36 


87 


O 1 oo 


783 




10 


40 


86 


3440 


860 


400 


o 


20 


86 


1720 


172 




•> 


30 


87 


2610 


261 




4 


40 


87 


3480 


348 




5 


50 


87 


4350 


435 




6 


60 


87 


5220 


522 




7 


70 


87 


6090 


609 




8 


80 


87 


6960 


696 




9 


90 


87 


7830 


783 




10 


100 


86 


8600 


860 


200 


o 


40 


86 


3440 


172 




o 


60 


87 


5220 


261 




4 


80 


87 


6960 


348 




5 


100 


87 


8700 


435 




6 


120 


87 


10440 


522 




7 


140 


87 


12180 


609 




8 


160 


87 


13920 


696 




9 


180 


87 


15660 


783 




10 


200 


86 


17200 


860 
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THE RTR CONFIGURATION NUMBER 1 



| Number of Blocks BE 




THE RTR CONFIGURATION NUMBER 2 



Number of Blocks / BE 


86 


86 


131 


130 


174 


174 


217 


218 


261 


261 


305 


304 


348 


348 


391 


392 


430 


430 



THE RTR CONFIGURATION NUMBER 3 



Number of Blocks / BE 


58 


57 


57 


87 


87 


87 


116 


116 


116 


145 


145 


145 


174 


174 


174 


203 


203 


203 


232 


232 


232 


261 


261 


261 


286 


287 


287 
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B. 



TEST-TRANSACTION MIX COMPONENT REPORTS 



SMALL DATABASE 
LARGE RECORD CLASS 
RECORD SIZE: 2000 



REQUEST SET 1 



Request 

Number: 


RETRIEVAL Request 
Queries: 


1 


((TEMP=Templg)and(INTONELG>=109)and(INTONELG< = 120)) 


O 


(( TEMP= Tern plg)and(INTTWOLG <=2343)) 



REQUEST SET 2 



Request 

Number: 


UPDATE Request 
Queries: 


I 4 


((TEMP=Templg)and(INTTWOLG< = 1172))(STR000001 = Oneeighth) 


5 


(( TEMP = Templg)and(INTTWOLG< =2343 ))(STR000005 = Onequartr) 


6 


((TEMP=Templg)and(I.NTTWOLG > 4686))(STR000010 = Onehalf) 
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REQUEST SET 3 



Request 


RETRIEVAL Request 


Number: 


Queries: 


7 


(( TEMP = Temp)g)and(INTTWOLG <= 4686)and(STR000001 = Oneeighth)) 


8 


((TEMP=Temp)g)and(STR000001 = Oneeighth)) 


9 


(( TEMP=Temp)g)and(STR000005 = Onequartr)) 


10 


((TEMP=Temp)g)and(STR000010 = Onehalf)) 


11 


((TEMP=Temp]g)and(INTTWOLG >= 4687)and(STR000010 = Onehalf)) 



REQUEST SET 4 



Request 

Number: 


RETRIEVE-COMMON 
Request Specifications: 


12 


RETRIEVE((TEMP = Tempig)and(INTONELG> = 109) 
and(INTONELG< = 120) 

COMMON(INTONELG.INTONEMEDLG) 

RETRIEVE((TEMP=Tempmedlg)and(INTONEMEDLG<=240)) 

(INTONEMEDLG) 


13 


RE TRIE VE( ( TEMP = Templg)and(STR000010= Onehalf) 
(1NTTWOLG) 

COMMON(INTONELG.INTONEMEDLG) 

RETRIEVE((TEMP=Tempmedlg)and(STR000010=Onehalf) 

(INTTWOMEDLG) 


14 


RE TRIE VE((TEMP=Templg)and(INTONELG< =4686)) 
(1NTONELG) 

COMM ON(INTTWOLG. INTTWOMEDLG) 

RETRIEVE((TEMP=Tempmedlg)andf INTONEMEDLG > = 3515) 
and(INTONEMEDLG<=46S6))(INTONEMEDLG) 
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REQUEST SET * 



Request 

Number: 


INSERT Request 
Queries: 


15 


(< TEMPLATE. Templg>.<INTONELG.l>.<INTTW OLG,l>,<Mu)tiple,Four>, 
< S TROOOOO 1 , Xxxxxxxxx > <STR0001 96.Xxxxxxxxx > ) 


16 


(<TEMPLATE.Templg>,<INTONELG.l>.<INTTWOLG,400>,<Multiple,One>, 
cSTROOOOOl, Xxxxxxxxx >,....<STR000196.Xxxxxxxxx>) 



REQUEST SET 6 



Request 


DELETE Request 


Number: 


Queries: 


17 


((TEMP=Templg)and(INTONELG> = 109)and(INTONELG< = 120)) 


19 


((TEMPLATE = Templg)and(INTTWOLG>= 7030)) 


20 


((TEMP = Templg)and(INTTWOLG <= 4686)and(STR000001 = Oneeighth)) 


21 


((TEMP=Templg)and(STR000001 = Oneeighth)) 


oo 


((TEMP=Templg)and(STR000005 = Onequartr)) 


09 


((TEMP=Templg)and(STR000010 = Onehalf)) 


24 


( ( TEMP=Templg)and(INTT W OLG <= 4687)and(STR000010 = Onebalf)) 
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SMALL DATABASE 
LARGE RECORD CLASS 
RECORD SIZE: 2000 



REQUEST SET 1 WORKLOAD 
RETRIEVE QUERIES 



Request 

Number 


Number of 
Clusters 
Examined 


Volume of 

Database 

Accessed 


Volume of 
Database 
Retrieved 


1 


86 


344 records 


12 records 


O 


339 


25.0960 % 


25.0000 % 



REQUEST SET 2 WORKLOAD 
UPDATE QUERIES 



Request 

Number 


Number of 
Clusters 
Examined 


Volume of 

Database 

Accessed 


Volume of 
Database 
Transacted 


4 


ojo 


12.5694 % 


12.5053 % 


5 


339 


25.0960 % 


25.0000 % 


6 


261 


50.0640 % 


50.0000 % 



REQUEST SET 3 WORKLOAD 
RETRIEVAL QUERIES 



Request 

Number 


Number of 
Clusters 
Examined 


Volume of 

Database 

Accessed 


Volume of 
Database 
Retrieved 


7 


521 


50.0854 % 


12.5000 % 


b 


781 


100.00 % 


12.51 % 


9 


781 


100.00 % 


25.00 % 


10 


781 


100.00 % 


50.00 % 


11 


261 


50.0640 % 


50.0107 % 
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REQUEST SET 4 WORKLOAD 

retrieval queries 



Request 

Number 


Number of 
Clusters 
Examined 
by the 
Source 
Request 


Number of | 
Records 
Accessed 
by the 
Source 
Request 


Number of 
Records 
Relevant 
to the 
Source 
Request 


Number of 
Clusters 
Examined 
by the 
Target 
Request 


Number of 
Records 
Accessed 
by the 
Target 
Request 


Number of 
Recoras 
Relevant 
to the 
Target 
Request 


Size of 
the 

Result 
Record 
Set in 
Records 


12 


86 


344 


12 


86 


688 


240 


12 


13 


781 


9372 


4686 


781 


18744 


9372 


4686 


14 


521 


4694 


4686 


87 


1740 


1171 


1171 



REQUEST SET 5 WORKLOAD 
INSERT QUERIES 



Request 

Number 


Number of 
Clusters 
Examined 


V olume of 

Database 

Accessed 


Volume of 
Database 
Inserted 


15 


- 


- 


1 record 


16 


- 


- 


1 record 
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REQUEST SET f> W ORKLOAD 
DELETE QUERIES 



Request 

Number 


Number of 
Clusters 
Examined 


Volume of 

Database 

Accessed 


Volume of 
Database 
Deleted 


17 


86 


344 records 


12 records 


19 


121 


25.07 % 


24.99 % 


20 


521 


50.0854 % 


12.5000 % 


21 


781 


100.00 % 


12.51 % 


oo 


781 


100.00 % 


25.00 % 


23 


781 


100.00 % 


50.00 % 


24 


261 


50.0640 % 


50.0107 % 
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